-
Notifications
You must be signed in to change notification settings - Fork 12
Structures for TLB KMD management #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
19cc104
to
d3f405f
Compare
d3f405f
to
b15daee
Compare
allocate_tlb.in = in; | ||
if (ioctl(get_fd(), TENSTORRENT_IOCTL_ALLOCATE_TLB, &allocate_tlb) < 0) { | ||
std::cout << "error" << std::endl; | ||
return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return a zero-initialized tenstorrent_allocate_tlb_out
which will look valid to the caller (tenstorrent_allocate_tlb_out::id == 0
is legal). The mmap offsets will also be zero. Should this throw an exception?
// For simplicity and correctness, only allow 32-bit aligned accesses. | ||
// There exist platform and device specific considerations for unaligned | ||
// accesses which are not addressed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wormhole has some read/modify/write logic to handle unaligned and 1/2 byte access cases.
See
// 1. AARCH64 device memory does not allow unaligned accesses (including pair loads/stores), |
and https://yyz-gitlab.local.tenstorrent.com/tenstorrent/syseng/-/issues/3487
tenstorrent_free_tlb free_tlb{}; | ||
free_tlb.in = in; | ||
if (ioctl(get_fd(), TENSTORRENT_IOCTL_FREE_TLB, &free_tlb) < 0) { | ||
return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider throwing an exception here -- although if you do, you'll want to ensure that such an exception cannot escape from ~TlbHandle()
. If you don't throw here, maybe an error should be logged. If UMD fails to free a TLB, it is probably UMD programmer error indicative of a bug.
void TlbWindow::write_register(uint64_t offset, uint32_t value) { write32(offset, value); } | ||
|
||
uint32_t TlbWindow::read_register(uint64_t offset) { return read32(offset); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register access should use an uncached mapping for ordering and immediate visibility. A more arcane concern is that WC mappings may be speculatively read by the CPU. I don't know of any registers in our system that have side-effects when read. But UC avoids this.
I don't have any immediate suggestion for how to deal with the UC/WC thing. A few ideas:
- always map them both (pros: flexible, cons: easier to misuse; I'm not 100% convinced this is reliable)
- choose mapping type when the window is constructed
- separate abstractions (e.g. TlbWindowUC, TlbWindowWC -- I am not sure I like this, but it's definitely explicit about intent...)
- ??
|
||
// Here it's not important how we have configured the TLB. For every read we will | ||
// do the reconfigure of the TLB window. | ||
tenstorrent_noc_tlb_config config; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest zero-initializing it anyway: tenstorrent_noc_tlb_config config{};
to avoid any surprise if it ends up with an uninitialized value that the driver doesn't like.
// Point of the test is to read NOC0 node id register. | ||
// TLB needs to be aligned to 2MB so these base and offset values are | ||
// how TLB should be programmed in order to get to addr 0xFFB2002C. | ||
const uint64_t tlb_base = 0xFFA00000; | ||
const uint64_t noc_node_id_tlb_offset = 0x12002C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Longer term (not necessary for this introductory PR) I think it's worth figuring out how the TlbWindow abstraction or the logic that creates the windows and hands them out can handle this for the caller.
tenstorrent_configure_tlb configure_tlb{}; | ||
configure_tlb.in = in; | ||
if (ioctl(get_fd(), TENSTORRENT_IOCTL_CONFIGURE_TLB, &configure_tlb) < 0) { | ||
return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to the one for PCIDevice::allocate_tlb
@@ -440,3 +440,34 @@ semver_t PCIDevice::read_kmd_version() { | |||
|
|||
return semver_t(version_str); | |||
} | |||
|
|||
tenstorrent_allocate_tlb_out PCIDevice::allocate_tlb(tenstorrent_allocate_tlb_in in) { | |||
tenstorrent_allocate_tlb allocate_tlb{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider is that old tt-kmd releases don't know about the TLB API. How should this situation be handled? Some kind of version check seems necessary, but we don't want to obstruct anything from running against old tt-kmd releases until this integration is more mature.
|
||
using namespace tt::umd; | ||
|
||
TEST(TestTlb, TestTlbWindowAllocateNew) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these should be in their own test module that's not called in CI, so we don't have failures if tests are run against old tt-kmd releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. Since this looks like scaffolding for future integration work I am fine with it getting merged as long as it doesn't disrupt CI. If you prefer to iterate a bit more before merging, I am happy to re-review.
Issue
Working towards #646 #463
Description
Add structures required to use KMD API for TLB management. This PR adds calls to KMD in order to leverage TLB management through KMD. On top of that there are classes implemented in order to use TLBs allocated using KMD. This PR is supposed to introduce this classes and tests for the TLBs. Integrating this should happen in the subsequent PR after we converge on these basic structures.
List of the changes
Testing
CI. Added test_tlb for testing TLB KMD calls
API Changes
/