-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add support for MtFuji elba dpu #18536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MtFuji elba dpu #18536
Conversation
21a1814
to
7d72a6e
Compare
@shanshri , can you rebase so that it can check semgrep |
e6ba75a
to
0bc062d
Compare
@shanshri please check the build failure |
platform/pensando/sonic-platform-modules-dpu/sonic_platform/chassis.py
Outdated
Show resolved
Hide resolved
platform/pensando/sonic-platform-modules-dpu/sonic_platform/sfp.py
Outdated
Show resolved
Hide resolved
platform/pensando/sonic-platform-modules-dpu/dpu/utils/fetch_dpu_status
Outdated
Show resolved
Hide resolved
platform/pensando/sonic-platform-modules-dpu/sonic_platform/base_pb2_grpc.py
Outdated
Show resolved
Hide resolved
@shanshri , can you please address comments. This PR is required to build dpu image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address comments and resolve conflicts
0bc062d
to
99f47cd
Compare
platform/pensando/sonic-platform-modules-dpu/dpu/service/midplane-network-dpu.service
Outdated
Show resolved
Hide resolved
platform/pensando/sonic-platform-modules-dpu/dpu/service/dpu-db-util.service
Show resolved
Hide resolved
platform/pensando/sonic-platform-modules-dpu/dpu/service/dpu-dhcp-renewal.service
Outdated
Show resolved
Hide resolved
platform/pensando/sonic-platform-modules-dpu/dpu/service/dpu_provisioning.service
Show resolved
Hide resolved
device/pensando/arm64-elba-asic-r0/Pensando-elba/port_config.ini
Outdated
Show resolved
Hide resolved
Signed-off-by: Shantanu Shrivastava <[email protected]> Signed-off-by: Sahil Chaudhari <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@shanshri @SahilChaudhari can you confirm if there are no pcie devices on the DPU board?
|
|
reviewing @KrisNey-MSFT fyi |
[like] Kristina Moore reacted to your message:
…________________________________
From: Prince George ***@***.***>
Sent: Monday, January 27, 2025 10:56:41 PM
To: sonic-net/sonic-buildimage ***@***.***>
Cc: Kristina Moore ***@***.***>; Mention ***@***.***>
Subject: Re: [sonic-net/sonic-buildimage] Add support for MtFuji elba dpu (PR #18536)
@prgeor<https://github.com/prgeor> could you check this PR and sign-off?
reviewing @KrisNey-MSFT<https://github.com/KrisNey-MSFT> fyi
—
Reply to this email directly, view it on GitHub<#18536 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFJSI6D2O3NMCWIZVB7CYA32M22STAVCNFSM6AAAAABFT2VYWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJXGA3TAOBQGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
platform/pensando/sonic-platform-modules-dpu/sonic_platform/fru_tlvinfo_decoder.py
Show resolved
Hide resolved
platform/pensando/dsc-drivers/src/drivers/linux/eth/ionic/ionic_bus_pci.c
Show resolved
Hide resolved
platform/pensando/dsc-drivers/src/drivers/linux/eth/ionic/ionic_bus_pci.c
Show resolved
Hide resolved
hi @prgeor amd @vvolam - shall we have a call to finish this off? @vijayvyasm @r12f for viz... |
@prgeor, it is confirmed. DPU will not have PCIE devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you
fi | ||
else | ||
echo "cp /usr/share/sonic/device/$platform/config_db_$pipeline.json /etc/sonic/config_db.json" | ||
cp /usr/share/sonic/device/$platform/config_db_$pipeline.json /etc/sonic/config_db.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shanshri this is a risky code. Why are you updating the config_db.json ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only updating it during first boot after a fresh installation on a system. Its needed so that libsai does not error out if it does not find default Ethernet interface when a fresh installation is done, which causes syncd to return.
It does not do that in later reboots or upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shanshri which default Ethernet interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uplink interface Ethernet0.
Polaris (msft pipeline) : Ethernet0
We also have other pipelines and cards where number of uplink interfaces can differ. This helps in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion, we have below AI for this comment which we take in the follow up PR:
Add a MtFuji specific platform_device.json similar to this https://github.com/sonic-net/sonic-buildimage/blob/master/device/nvidia-bluefield/arm64-nvda_bf-9009d3b600svaa/platform.json
This will help in generating config_db.json with uplink interface as Ethernet0, which will be in sync with polaris docker container.
cmd = "docker cp {}:/tmp/fru.json /home/admin".format(docker_image_id) | ||
self._api_helper.runCMD(cmd) | ||
time.sleep(0.5) | ||
self._api_helper.runCMD("cp /home/admin/fru.json {}".format(self.fru_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SahilChaudhari why are we using home dir for fru related information>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor, we are using fru.json from DPU firmware container, which I am copying to home dir and from there I am copying it to /usr/share/sonic/device// for host and '/usr/share/sonic/device' for pmon container. Once done, we are not using fru.json from home dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shanshri can we keep this in the DPU platform dir always? The pmon can have access to the file on the host too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is the fru eeprom for DPU is not in tlv format used by sonic. So this workaround is needed for that.
Below AI to be taken in the followup PR:
Copy fru.json directly from polaris container to platform_dir and not to keep in the host /home/admin directory.
@KrisNey-MSFT @prgeor Please note the above two pending comments to be resolved in a subsequent PR |
Add support for MtFuji elba dpu
Add support for MtFuji elba dpu
Hi @shanshri , pensando build fails. Can you check the root cause?
|
Hi @liushilongbuaa, can you please refer to this PR #21949? |
@SahilChaudhari , do you mean 21949 will fix this error? |
Yes @liushilongbuaa |
Addressed two action items from PR #18536 Removed home dir from dpu_pensando_util.py for copying files from Pensando firmware container: Earlier from Pensando firmware container, files were copied to first home dir /home/admin and then from there to shared directory /usr/share/sonic/device/. Dissolved config_db.json into minigraph.xml, platform.json and init_cfg.json Earlier config_db.json was copied from /usr/share/sonic/device/arm64-elba-asic-flash128-r0/config_db.json to /etc/sonic/config_db.json on first boot up post installation. Now on first boot, minigraph.xml and init_cfg.json gets copied to /etc/sonic and along with sonic default init_cfg.json, config_db.json is getting generated using sonic-cfggen command. This way, config_db.json will have flexibility for schema upgrades. Bug fix: Addressed slot id UNDEFINED issue for dpu_provisioning.sh and for DPU_STATE table entries --------- Signed-off-by: Sahil Chaudhari <[email protected]> Signed-off-by: Shantanu Shrivastava <[email protected]>
Add support for MtFuji elba dpu
…2058) Addressed two action items from PR sonic-net#18536 Removed home dir from dpu_pensando_util.py for copying files from Pensando firmware container: Earlier from Pensando firmware container, files were copied to first home dir /home/admin and then from there to shared directory /usr/share/sonic/device/. Dissolved config_db.json into minigraph.xml, platform.json and init_cfg.json Earlier config_db.json was copied from /usr/share/sonic/device/arm64-elba-asic-flash128-r0/config_db.json to /etc/sonic/config_db.json on first boot up post installation. Now on first boot, minigraph.xml and init_cfg.json gets copied to /etc/sonic and along with sonic default init_cfg.json, config_db.json is getting generated using sonic-cfggen command. This way, config_db.json will have flexibility for schema upgrades. Bug fix: Addressed slot id UNDEFINED issue for dpu_provisioning.sh and for DPU_STATE table entries --------- Signed-off-by: Sahil Chaudhari <[email protected]> Signed-off-by: Shantanu Shrivastava <[email protected]>
…2058) Addressed two action items from PR sonic-net#18536 Removed home dir from dpu_pensando_util.py for copying files from Pensando firmware container: Earlier from Pensando firmware container, files were copied to first home dir /home/admin and then from there to shared directory /usr/share/sonic/device/. Dissolved config_db.json into minigraph.xml, platform.json and init_cfg.json Earlier config_db.json was copied from /usr/share/sonic/device/arm64-elba-asic-flash128-r0/config_db.json to /etc/sonic/config_db.json on first boot up post installation. Now on first boot, minigraph.xml and init_cfg.json gets copied to /etc/sonic and along with sonic default init_cfg.json, config_db.json is getting generated using sonic-cfggen command. This way, config_db.json will have flexibility for schema upgrades. Bug fix: Addressed slot id UNDEFINED issue for dpu_provisioning.sh and for DPU_STATE table entries --------- Signed-off-by: Sahil Chaudhari <[email protected]> Signed-off-by: Shantanu Shrivastava <[email protected]>
This patchset adds sonic buildimage support for AMD-Pensando DPU on MtFuji DSS. MtFuji is a DSS being developed in collaboration with AMD-Pensando and Cisco for data center applications.
MtFuji mounts elba based nic which is an AMD-Pensando PCI Distributed Services Card (DSC) whose support has been added in SONiC.
The changes are verified on Pensando DSS-MTFUJI card. There is one 200G uplink port and no management port. The link and traffic has been tested on the port.
Why I did it
This patchset adds sonic buildimage support for AMD-Pensando DPU on MtFuji DSS. MtFuji is a DSS being developed in collaboration with AMD-Pensando and Cisco for data center applications.
MtFuji mounts elba based nic which is an AMD-Pensando PCI Distributed Services Card (DSC) whose support has been added in SONiC.
Work item tracking
How I did it
Created a new device arm64-elba-asic-flash128-r0 and added the change for mtfuji in platform.conf to create bootconf which points to correct dtb
How to verify it
Load the SONiC image from ONIE and make sure the interfaces are UP.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Tested on master
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)