Skip to content

Implement create ethernet map #722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

pjanevskiTT
Copy link
Contributor

@pjanevskiTT pjanevskiTT commented Apr 10, 2025

Issue

#73 #625. This PR is not going to close any of these issues, just a big part in working towards closing it.

Description

Implement topology discovery class for Wormhole with old routing fw. This class should have the same functionality as CEM lib from luwen. The idea is to replace CEM lib with our own implementation. In order to do so, we need to secure wait on ARC, ETH and DRAM, same as we did for Blackhole. #723 #724

Blackhole and 6U topology discovery should be moved to this class as well. I just wanted to keep the scope of this PR smaller.

List of the changes

  • Implement topology discovery for Wormhole with old routing fw
  • Remote get board type
  • Remote arc msg
  • Add test for topology discovery to run it on wormhole

Testing

Topology discovery is still not integrated into main UMD code path. Manually tested on N150, N300, T3K and TG configs.

API Changes

/

@pjanevskiTT
Copy link
Contributor Author

@TTDRosen can't tag you in reviewers

@pjanevskiTT pjanevskiTT self-assigned this Apr 10, 2025
Copy link
Contributor

@joelsmithTT joelsmithTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good first step towards removing CEM lib.

}

void TopologyDiscovery::discover_remote_chips() {
const uint64_t conn_info = 0x1200;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that FW ever changes its layout will these will have to get updated? Should the code do a version check of the FW to ensure that both sides are in agreement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is already different layout, but I think it's for different layouts. Wanted to ping @TTDRosen about this specifically to comment how much is this going to change in the future fw releases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the code do a version check

But you can't make this future proof? If this code works on current FW version, you probably shouldn't blindly block it from working on future versions of FW, maybe nothing will change, maybe there will be a breaking change. There could be a "minimum version" though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I settled into was not hardcoding any of the offsets and selecting them based on the fw version (implementation). The layout should not change but from what you can see in the luwen code that has not played out well in practice. Though I agree with Bojan I think you can pretty confidently accept a minimum supported ethernet version to help simplify this logic.

Some of the problem is due to really poor interfaces between SW and FW which leads to it not being clear to HSIO what should be considered public/private interfaces and therefore incorrect usage of semver. Specifically meaning these interfaces changing without major version bumps.

@pjanevskiTT pjanevskiTT force-pushed the pjanevski/cem_rebase branch 2 times, most recently from 451194b to 0476d9e Compare April 11, 2025 11:14
}

void TopologyDiscovery::discover_remote_chips() {
const uint64_t conn_info = 0x1200;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the code do a version check

But you can't make this future proof? If this code works on current FW version, you probably shouldn't blindly block it from working on future versions of FW, maybe nothing will change, maybe there will be a breaking change. There could be a "minimum version" though

@pjanevskiTT pjanevskiTT force-pushed the pjanevski/cem_rebase branch from 228f76b to 333f3ad Compare April 15, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants