Skip to content

Structures for TLB KMD management #733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

pjanevskiTT
Copy link
Contributor

@pjanevskiTT pjanevskiTT commented Apr 11, 2025

Issue

Working towards #646 #463

Description

Add structures required to use KMD API for TLB management. This PR adds calls to KMD in order to leverage TLB management through KMD. On top of that there are classes implemented in order to use TLBs allocated using KMD. This PR is supposed to introduce this classes and tests for the TLBs. Integrating this should happen in the subsequent PR after we converge on these basic structures.

List of the changes

  • Add TlbHandle class, representing the struct with all the information important for TLB allocated
  • Add TlbWindow class, representing the struct used for reading/writing data through TLB
  • Add TlbAllocator class, representing the allocator for TLBs. Allocating TLB is done using get_tlb call, free of TLB is for now done inside the TLB handle on destructor. Think about changing this
  • Add tests

Testing

CI. Added test_tlb for testing TLB KMD calls

API Changes

/

@pjanevskiTT pjanevskiTT self-assigned this Apr 11, 2025
@pjanevskiTT pjanevskiTT changed the title Structures for TLB KMD management TLB KMD management Apr 11, 2025
@pjanevskiTT pjanevskiTT changed the title TLB KMD management Structures for TLB KMD management Apr 11, 2025
@pjanevskiTT pjanevskiTT marked this pull request as ready for review April 15, 2025 12:03
allocate_tlb.in = in;
if (ioctl(get_fd(), TENSTORRENT_IOCTL_ALLOCATE_TLB, &allocate_tlb) < 0) {
std::cout << "error" << std::endl;
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return a zero-initialized tenstorrent_allocate_tlb_out which will look valid to the caller (tenstorrent_allocate_tlb_out::id == 0 is legal). The mmap offsets will also be zero. Should this throw an exception?

Comment on lines +54 to +56
// For simplicity and correctness, only allow 32-bit aligned accesses.
// There exist platform and device specific considerations for unaligned
// accesses which are not addressed here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wormhole has some read/modify/write logic to handle unaligned and 1/2 byte access cases.

See

// 1. AARCH64 device memory does not allow unaligned accesses (including pair loads/stores),

and https://yyz-gitlab.local.tenstorrent.com/tenstorrent/syseng/-/issues/3487

tenstorrent_free_tlb free_tlb{};
free_tlb.in = in;
if (ioctl(get_fd(), TENSTORRENT_IOCTL_FREE_TLB, &free_tlb) < 0) {
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider throwing an exception here -- although if you do, you'll want to ensure that such an exception cannot escape from ~TlbHandle(). If you don't throw here, maybe an error should be logged. If UMD fails to free a TLB, it is probably UMD programmer error indicative of a bug.

Comment on lines +22 to +24
void TlbWindow::write_register(uint64_t offset, uint32_t value) { write32(offset, value); }

uint32_t TlbWindow::read_register(uint64_t offset) { return read32(offset); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Register access should use an uncached mapping for ordering and immediate visibility. A more arcane concern is that WC mappings may be speculatively read by the CPU. I don't know of any registers in our system that have side-effects when read. But UC avoids this.

I don't have any immediate suggestion for how to deal with the UC/WC thing. A few ideas:

  • always map them both (pros: flexible, cons: easier to misuse; I'm not 100% convinced this is reliable)
  • choose mapping type when the window is constructed
  • separate abstractions (e.g. TlbWindowUC, TlbWindowWC -- I am not sure I like this, but it's definitely explicit about intent...)
  • ??


// Here it's not important how we have configured the TLB. For every read we will
// do the reconfigure of the TLB window.
tenstorrent_noc_tlb_config config;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest zero-initializing it anyway: tenstorrent_noc_tlb_config config{}; to avoid any surprise if it ends up with an uninitialized value that the driver doesn't like.

Comment on lines +110 to +114
// Point of the test is to read NOC0 node id register.
// TLB needs to be aligned to 2MB so these base and offset values are
// how TLB should be programmed in order to get to addr 0xFFB2002C.
const uint64_t tlb_base = 0xFFA00000;
const uint64_t noc_node_id_tlb_offset = 0x12002C;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer term (not necessary for this introductory PR) I think it's worth figuring out how the TlbWindow abstraction or the logic that creates the windows and hands them out can handle this for the caller.

tenstorrent_configure_tlb configure_tlb{};
configure_tlb.in = in;
if (ioctl(get_fd(), TENSTORRENT_IOCTL_CONFIGURE_TLB, &configure_tlb) < 0) {
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to the one for PCIDevice::allocate_tlb

@@ -440,3 +440,34 @@ semver_t PCIDevice::read_kmd_version() {

return semver_t(version_str);
}

tenstorrent_allocate_tlb_out PCIDevice::allocate_tlb(tenstorrent_allocate_tlb_in in) {
tenstorrent_allocate_tlb allocate_tlb{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider is that old tt-kmd releases don't know about the TLB API. How should this situation be handled? Some kind of version check seems necessary, but we don't want to obstruct anything from running against old tt-kmd releases until this integration is more mature.


using namespace tt::umd;

TEST(TestTlb, TestTlbWindowAllocateNew) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these should be in their own test module that's not called in CI, so we don't have failures if tests are run against old tt-kmd releases?

Copy link
Contributor

@joelsmithTT joelsmithTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Since this looks like scaffolding for future integration work I am fine with it getting merged as long as it doesn't disrupt CI. If you prefer to iterate a bit more before merging, I am happy to re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants