Skip to content

Add support for MtFuji elba dpu #18536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 19, 2025

Conversation

shanshri
Copy link
Contributor

@shanshri shanshri commented Apr 2, 2024

This patchset adds sonic buildimage support for AMD-Pensando DPU on MtFuji DSS. MtFuji is a DSS being developed in collaboration with AMD-Pensando and Cisco for data center applications.
MtFuji mounts elba based nic which is an AMD-Pensando PCI Distributed Services Card (DSC) whose support has been added in SONiC.

The changes are verified on Pensando DSS-MTFUJI card. There is one 200G uplink port and no management port. The link and traffic has been tested on the port.

Why I did it

This patchset adds sonic buildimage support for AMD-Pensando DPU on MtFuji DSS. MtFuji is a DSS being developed in collaboration with AMD-Pensando and Cisco for data center applications.
MtFuji mounts elba based nic which is an AMD-Pensando PCI Distributed Services Card (DSC) whose support has been added in SONiC.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Created a new device arm64-elba-asic-flash128-r0 and added the change for mtfuji in platform.conf to create bootconf which points to correct dtb

How to verify it

Load the SONiC image from ONIE and make sure the interfaces are UP.

Which release branch to backport (provide reason below if selected)

Tested branch (Please provide the tested image version)

Tested on master

  • <master.0-dirty-20240402.113733 -->

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

1. show system-health detail

root@sonic:/home/admin# show system-health detail
System status summary

  System status LED  green
  Services:
    Status: OK
  Hardware:
    Status: OK

System services and devices monitor list

Name                      Status    Type
------------------------  --------  ----------
sonic                     OK        System
rsyslog                   OK        Process
root-overlay              OK        Filesystem
var-log                   OK        Filesystem
routeCheck                OK        Program
dualtorNeighborCheck      OK        Program
diskCheck                 OK        Program
container_checker         OK        Program
vnetRouteCheck            OK        Program
memory_check              OK        Program
container_memory_snmp     OK        Program
dpu-db-util               OK        Program
syncd:syncd               OK        Process
bgp:zebra                 OK        Process
bgp:staticd               OK        Process
bgp:bgpd                  OK        Process
bgp:fpmsyncd              OK        Process
bgp:bgpcfgd               OK        Process
swss:orchagent            OK        Process
swss:portsyncd            OK        Process
swss:neighsyncd           OK        Process
swss:fdbsyncd             OK        Process
swss:vlanmgrd             OK        Process
swss:intfmgrd             OK        Process
swss:portmgrd             OK        Process
swss:fabricmgrd           OK        Process
swss:buffermgrd           OK        Process
swss:vrfmgrd              OK        Process
swss:nbrmgrd              OK        Process
swss:vxlanmgrd            OK        Process
swss:coppmgrd             OK        Process
swss:tunnelmgrd           OK        Process
dpu-pdsagent              OK        UserDefine
dpu-pciemgrd              OK        UserDefine
dpu-eth_Uplink1/1_status  OK        UserDefine
dpu-pcie_link             OK        UserDefine

System services and devices ignore list

Name    Status    Type
------  --------  ------
fan     Ignored   Device
psu     Ignored   Device

2. fwutil show

root@sonic:/home/admin# fwutil show status
Chassis                   Module    Component            Version          Description
------------------------  --------  -------------------  ---------------  -----------------------
Pensando-elba DSS-MTFUJI  N/A       DPUFW-7              20241203.072806  DPU-7 SONiC Image
                                    DPUQSPI_GOLDFW-7     1.68-G-21        DPU-7 GOLDFW
                                    DPUQSPI_GOLDUBOOT-7  1.68-G-21        DPU-7 GOLDUBOOT
                                    DPUQSPI_UBOOTA-7     1.5.0-EXP        DPU-7 UBOOTA
                                    eMMC                 None             Internal storage device
root@sonic:/home/admin# fwutil show updates
Chassis                   Module    Component            Firmware      Version (Current/Available)    Status
------------------------  --------  -------------------  ------------  -----------------------------  ------------------
Pensando-elba DSS-MTFUJI  N/A       DPUQSPI_GOLDFW-7     /host/images  1.68-G-21 / 1.68-G-22          update is required
                                    DPUQSPI_GOLDUBOOT-7  /host/images  1.68-G-21 / 1.68-G-21          up-to-date
                                    DPUQSPI_UBOOTA-7     /host/images  1.5.0-EXP / 1.5.0-EXP          up-to-date

3. pcieutil check : N/A (No pcie devices on DPU)

4. sensors

root@sonic:/home/admin# sensors
ltc3882-i2c-0-44
Adapter: Synopsys DesignWare I2C adapter
vin:          12.03 V  (min =  +6.30 V, crit max = +15.50 V)
                       (highest = +12.09 V)
vout1:       850.00 mV (crit min =  +0.68 V, min =  +0.72 V)
                       (max =  +0.98 V, crit max =  +1.02 V)
                       (highest =  +0.85 V)
vout2:         1.20 V  (crit min =  +0.96 V, min =  +1.02 V)
                       (max =  +1.38 V, crit max =  +1.44 V)
                       (highest =  +1.20 V)
temp1:        +53.9 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +55.9 C)
temp2:        +54.3 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +56.1 C)
temp3:        +58.1 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +60.1 C)
pout1:         2.52 W
pout2:         2.06 W
iout1:         2.94 A  (max = +30.00 A, crit max = +35.00 A)
                       (highest =  +3.19 A)
iout2:         1.50 A  (max = +30.00 A, crit max = +35.00 A)
                       (highest =  +4.58 A)

ltc3882-i2c-0-66
Adapter: Synopsys DesignWare I2C adapter
vin:          12.03 V  (min =  +6.30 V, crit max = +15.50 V)
                       (highest = +12.09 V)
vout1:       760.00 mV (crit min =  +0.60 V, min =  +0.64 V)
                       (max =  +0.86 V, crit max =  +0.90 V)
                       (highest =  +0.76 V)
vout2:       852.00 mV (crit min =  +0.68 V, min =  +0.72 V)
                       (max =  +0.98 V, crit max =  +1.02 V)
                       (highest =  +0.85 V)
temp1:        +50.2 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +53.6 C)
temp2:        +48.7 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +52.3 C)
temp3:        +55.3 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +58.7 C)
pout1:        10.88 W
pout2:        10.47 W
iout1:        14.28 A  (max = +30.00 A, crit max = +35.00 A)
                       (highest = +25.59 A)
iout2:        11.86 A  (max = +30.00 A, crit max = +35.00 A)
                       (highest = +13.75 A)

ltc3882-i2c-0-55
Adapter: Synopsys DesignWare I2C adapter
vin:          12.03 V  (min =  +6.30 V, crit max = +15.50 V)
                       (highest = +12.09 V)
vout1:       759.00 mV (crit min =  +0.60 V, min =  +0.64 V)
                       (max =  +0.86 V, crit max =  +0.90 V)
                       (highest =  +0.76 V)
vout2:       760.00 mV (crit min =  +0.60 V, min =  +0.64 V)
                       (max =  +0.86 V, crit max =  +0.90 V)
                       (highest =  +0.76 V)
temp1:        +49.9 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +52.6 C)
temp2:        +50.9 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +53.8 C)
temp3:        +57.1 C  (high = +85.0 C, crit low = -40.0 C)
                       (crit = +100.0 C, highest = +59.7 C)
pout1:        10.14 W
pout2:        10.19 W
iout1:        13.38 A  (max = +30.00 A, crit max = +35.00 A)
                       (highest = +19.59 A)
iout2:        13.42 A  (max = +30.00 A, crit max = +35.00 A)
                       (highest = +24.84 A)

5. psushow -s : N/A (No psu on DPU)

6. show platform temperature

root@sonic:/home/admin# show platform temperature
           Sensor    Temperature    High TH    Low TH    Crit High TH    Crit Low TH    Warning          Timestamp
-----------------  -------------  ---------  --------  --------------  -------------  ---------  -----------------
Board temperature         50.5          N/A       N/A               0            N/A      False  20241203 06:33:50
  Die temperature         48.937        N/A       N/A               0            N/A      False  20241203 06:33:50
 Thermal sensor 1         50.5          N/A       N/A               0            N/A      False  20241203 06:33:50
 Thermal sensor 2         50.125        N/A       N/A               0            N/A      False  20241203 06:33:50
 Thermal sensor 3         54.312        N/A       N/A               0            N/A      False  20241203 06:33:50

7. show platform voltage

root@sonic:/home/admin# show platform voltage
          Sensor    Voltage    High TH    Low TH    Crit High TH    Crit Low TH    Warning          Timestamp
----------------  ---------  ---------  --------  --------------  -------------  ---------  -----------------
Voltage sensor 1     0.76 V        N/A       N/A             N/A            N/A      False  20241203 06:33:50
Voltage sensor 2    0.759 V        N/A       N/A             N/A            N/A      False  20241203 06:33:50
Voltage sensor 3     0.85 V        N/A       N/A             N/A            N/A      False  20241203 06:33:50

8. decode-syseeprom

root@sonic:/home/admin# decode-syseeprom
TlvInfo Header:
   Id String:    TlvInfo
   Version:      1
   Total Length: 210
TLV Name             Code Len Value
-------------------- ---- --- -----
Product Name         0x21   6 MTFUJI
Part Number          0x22  10 DSS-MTFUJI
Serial Number        0x23  11 FLM2750037R
Base MAC Address     0x24   6 B0:8D:57:CD:36:0F
Manufacture Date     0x25  10 1713139200
Device Version       0x26   1 1
Label Revision       0x27  13 Not Available
Platform Name        0x28  13 Not Available
ONIE Version         0x29  26 master-03250614-V001-dirty
MAC Addresses        0x2A   2 16
Manufacturer         0x2B   8 CISCO
Manufacture Country  0x2C  13 Not Available
Vendor Name          0x2D  13 Not Available
Diag Version         0x2E  13 Not Available
Service Tag          0x2F  13 Not Available
Vendor Extension     0xFD  14 0x31 0x32 0x2F 0x30 0x34 0x2F 0x32 0x33 0x0A 0x0A 0x30 0x0A 0x0A 0x30
CRC-32               0xFE   4 0x6B0B1E68

@shanshri shanshri requested a review from lguohan as a code owner April 2, 2024 17:50
@shanshri shanshri force-pushed the mtfuji-sonicbuildimage branch from 21a1814 to 7d72a6e Compare April 8, 2024 15:42
@lguohan
Copy link
Collaborator

lguohan commented May 13, 2024

@shanshri , can you rebase so that it can check semgrep

@shanshri shanshri force-pushed the mtfuji-sonicbuildimage branch 2 times, most recently from e6ba75a to 0bc062d Compare August 8, 2024 12:23
@prsunny prsunny requested a review from prgeor August 8, 2024 16:28
@prgeor
Copy link
Contributor

prgeor commented Aug 28, 2024

@shanshri please check the build failure

@vdahiya12 vdahiya12 self-requested a review August 28, 2024 20:06
@prsunny
Copy link
Contributor

prsunny commented Aug 30, 2024

@shanshri , can you please address comments. This PR is required to build dpu image.

prsunny
prsunny previously approved these changes Sep 18, 2024
@prsunny prsunny self-requested a review September 18, 2024 19:50
Copy link
Contributor

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address comments and resolve conflicts

prsunny
prsunny previously approved these changes Sep 20, 2024
Signed-off-by: Shantanu Shrivastava <[email protected]>
Signed-off-by: Sahil Chaudhari <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prgeor
Copy link
Contributor

prgeor commented Jan 27, 2025

@shanshri @SahilChaudhari can you confirm if there are no pcie devices on the DPU board?

pcieutil check : N/A (No pcie devices on DPU)

@prgeor
Copy link
Contributor

prgeor commented Jan 27, 2025

show platform voltage are these output capture for DPU ? Why there are no threshold sets for these voltage sensors? don't we care threshold/warning from them?

@prgeor
Copy link
Contributor

prgeor commented Jan 27, 2025

@prgeor could you check this PR and sign-off?

reviewing @KrisNey-MSFT fyi

@KrisNey-MSFT
Copy link

KrisNey-MSFT commented Jan 27, 2025 via email

@KrisNey-MSFT
Copy link

KrisNey-MSFT commented Feb 3, 2025

hi @prgeor amd @vvolam - shall we have a call to finish this off? @vijayvyasm @r12f for viz...

@SahilChaudhari
Copy link
Contributor

@shanshri @SahilChaudhari can you confirm if there are no pcie devices on the DPU board?

pcieutil check : N/A (No pcie devices on DPU)

@prgeor, it is confirmed. DPU will not have PCIE devices

Copy link
Contributor

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you

fi
else
echo "cp /usr/share/sonic/device/$platform/config_db_$pipeline.json /etc/sonic/config_db.json"
cp /usr/share/sonic/device/$platform/config_db_$pipeline.json /etc/sonic/config_db.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanshri this is a risky code. Why are you updating the config_db.json ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only updating it during first boot after a fresh installation on a system. Its needed so that libsai does not error out if it does not find default Ethernet interface when a fresh installation is done, which causes syncd to return.
It does not do that in later reboots or upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanshri which default Ethernet interface?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uplink interface Ethernet0.
Polaris (msft pipeline) : Ethernet0
We also have other pipelines and cards where number of uplink interfaces can differ. This helps in that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion, we have below AI for this comment which we take in the follow up PR:
Add a MtFuji specific platform_device.json similar to this https://github.com/sonic-net/sonic-buildimage/blob/master/device/nvidia-bluefield/arm64-nvda_bf-9009d3b600svaa/platform.json
This will help in generating config_db.json with uplink interface as Ethernet0, which will be in sync with polaris docker container.

cmd = "docker cp {}:/tmp/fru.json /home/admin".format(docker_image_id)
self._api_helper.runCMD(cmd)
time.sleep(0.5)
self._api_helper.runCMD("cp /home/admin/fru.json {}".format(self.fru_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SahilChaudhari why are we using home dir for fru related information>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prgeor, we are using fru.json from DPU firmware container, which I am copying to home dir and from there I am copying it to /usr/share/sonic/device// for host and '/usr/share/sonic/device' for pmon container. Once done, we are not using fru.json from home dir.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shanshri can we keep this in the DPU platform dir always? The pmon can have access to the file on the host too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is the fru eeprom for DPU is not in tlv format used by sonic. So this workaround is needed for that.
Below AI to be taken in the followup PR:
Copy fru.json directly from polaris container to platform_dir and not to keep in the host /home/admin directory.

@vijasrin
Copy link

vijasrin commented Feb 19, 2025

@KrisNey-MSFT @prgeor Please note the above two pending comments to be resolved in a subsequent PR

@kperumalbfn kperumalbfn merged commit ddcd315 into sonic-net:master Feb 19, 2025
20 checks passed
ram25794 pushed a commit to ram25794/sonic-buildimage that referenced this pull request Feb 21, 2025
prabhataravind pushed a commit to prabhataravind/sonic-buildimage that referenced this pull request Mar 5, 2025
@liushilongbuaa
Copy link
Contributor

Hi @shanshri , pensando build fails. Can you check the root cause?
Log Link

jinja2.exceptions.UndefinedError: 'BUILD_REDUCE_IMAGE_SIZE' is undefined

If you're trying to pipe a .env file, please run me with a '-' as the data file name:
$ j2 files/dsc/install_debian.j2 -
[  FAIL LOG END  ] [ target/sonic-pensando.tar ]

@SahilChaudhari
Copy link
Contributor

Hi @shanshri , pensando build fails. Can you check the root cause? Log Link

jinja2.exceptions.UndefinedError: 'BUILD_REDUCE_IMAGE_SIZE' is undefined

If you're trying to pipe a .env file, please run me with a '-' as the data file name:
$ j2 files/dsc/install_debian.j2 -
[  FAIL LOG END  ] [ target/sonic-pensando.tar ]

Hi @liushilongbuaa, can you please refer to this PR #21949?
You can check diff of files/dsc/install_debian.j2

@liushilongbuaa
Copy link
Contributor

@SahilChaudhari , do you mean 21949 will fix this error?

@SahilChaudhari
Copy link
Contributor

@SahilChaudhari , do you mean 21949 will fix this error?

Yes @liushilongbuaa

rlhui pushed a commit that referenced this pull request Apr 19, 2025
Addressed two action items from PR #18536

Removed home dir from dpu_pensando_util.py for copying files from Pensando firmware container:
Earlier from Pensando firmware container, files were copied to first home dir /home/admin and then from there to shared directory /usr/share/sonic/device/.
Dissolved config_db.json into minigraph.xml, platform.json and init_cfg.json
Earlier config_db.json was copied from /usr/share/sonic/device/arm64-elba-asic-flash128-r0/config_db.json to /etc/sonic/config_db.json on first boot up post installation. Now on first boot, minigraph.xml and init_cfg.json gets copied to /etc/sonic and along with sonic default init_cfg.json, config_db.json is getting generated using sonic-cfggen command. This way, config_db.json will have flexibility for schema upgrades.
Bug fix:

Addressed slot id UNDEFINED issue for dpu_provisioning.sh and for DPU_STATE table entries
---------

Signed-off-by: Sahil Chaudhari <[email protected]>
Signed-off-by: Shantanu Shrivastava <[email protected]>
yanjundeng pushed a commit to yanjundeng/sonic-buildimage that referenced this pull request Apr 23, 2025
yanjundeng pushed a commit to yanjundeng/sonic-buildimage that referenced this pull request Apr 23, 2025
…2058)

Addressed two action items from PR sonic-net#18536

Removed home dir from dpu_pensando_util.py for copying files from Pensando firmware container:
Earlier from Pensando firmware container, files were copied to first home dir /home/admin and then from there to shared directory /usr/share/sonic/device/.
Dissolved config_db.json into minigraph.xml, platform.json and init_cfg.json
Earlier config_db.json was copied from /usr/share/sonic/device/arm64-elba-asic-flash128-r0/config_db.json to /etc/sonic/config_db.json on first boot up post installation. Now on first boot, minigraph.xml and init_cfg.json gets copied to /etc/sonic and along with sonic default init_cfg.json, config_db.json is getting generated using sonic-cfggen command. This way, config_db.json will have flexibility for schema upgrades.
Bug fix:

Addressed slot id UNDEFINED issue for dpu_provisioning.sh and for DPU_STATE table entries
---------

Signed-off-by: Sahil Chaudhari <[email protected]>
Signed-off-by: Shantanu Shrivastava <[email protected]>
vidyac86 pushed a commit to vidyac86/sonic-buildimage that referenced this pull request Apr 23, 2025
…2058)

Addressed two action items from PR sonic-net#18536

Removed home dir from dpu_pensando_util.py for copying files from Pensando firmware container:
Earlier from Pensando firmware container, files were copied to first home dir /home/admin and then from there to shared directory /usr/share/sonic/device/.
Dissolved config_db.json into minigraph.xml, platform.json and init_cfg.json
Earlier config_db.json was copied from /usr/share/sonic/device/arm64-elba-asic-flash128-r0/config_db.json to /etc/sonic/config_db.json on first boot up post installation. Now on first boot, minigraph.xml and init_cfg.json gets copied to /etc/sonic and along with sonic default init_cfg.json, config_db.json is getting generated using sonic-cfggen command. This way, config_db.json will have flexibility for schema upgrades.
Bug fix:

Addressed slot id UNDEFINED issue for dpu_provisioning.sh and for DPU_STATE table entries
---------

Signed-off-by: Sahil Chaudhari <[email protected]>
Signed-off-by: Shantanu Shrivastava <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.