Skip to content

Update qos sai due to no lossless pg for some platforms #16440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 1, 2025

Conversation

JibinBao
Copy link
Contributor

@JibinBao JibinBao commented Jan 10, 2025

Description of PR

  1. For spc4 and above, there is only the lossy buffer, so the buffer for the lossless buffer will be taken by the lossy buffer. If the packet size is too small, the packet number sent to occupy the shared buffer will increase a lot, which will lead to the descriptor being exhausted, so update testQosSaiPgSharedWatermark, testQosSaiQSharedWatermark, and testQosSaiLossyQueue accordingly.
  2. Remove the test config of scheduler.block_data_plane, otherwise it might raise yang validation error when do config reload
  3. When there is no lossless buffer, return a dump buffer lossless pg profile, and skip tests related to lossless buffer case dynamically
  4. Skip fixture reaseAllports for mellanox device, because after qos test is finished, the teardown will do config reload, it will restore the config of ports, we don't need this fixture before running tests. Also it can save 2 minutes
  5. list the relevant Prs:
    [Mellanox] Update buffer calculations for Mellanox-SN5600-C224O8 SKU sonic-buildimage#20992
    [Mellanox] Add x86_64-nvidia_sn5610n-r0 new platform and SKUs sonic-buildimage#21056
    [Mellanox] Update Mellanox-SN5600-C256S1 buffer calculations sonic-buildimage#20991
    [Mellanox] Add x86_64-nvidia_sn5610n-r0 new platform and SKUs sonic-buildimage#21056
    [Mellanox] Update Mellanox-SN5600-C256S1, Mellanox-SN5600-C224O8 buffers and DSCP mapping sonic-buildimage#21427
    [Mellanox] Add x86_64-nvidia_sn5610n-r0 new platform and SKUs sonic-buildimage#21056
    [Mellanox] Update DSCP mapping for SN5600, SN5610 SKUs sonic-buildimage#21762

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
    • Add ownership here(Microsft required only)
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

update the qos sai test for no pg lossless buffer platform

How did you do it?

update for lossy case and skip test relatd to pg buffer lossless

How did you verify/test it?

Run qos sai test on platform without pg lossless buffer plaform

Any platform specific information?

sn5600 and sn5610

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JibinBao
Copy link
Contributor Author

/azpw run Azure.sonic-mgmt

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-mgmt

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JibinBao JibinBao changed the title Update qos sai due to no lossless pg for some platform Update qos sai due to no lossless pg for some platforms Jan 10, 2025
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Pull request contains merge conflicts.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld added a commit to mssonicbld/sonic-buildimage-msft that referenced this pull request Feb 20, 2025
#### Why I did it
To have DSCP mapping updated to Mellanox SN5600, SN5610N SKUs

#### How I did it
Update buffers_defaults_objects.j2 and qos.json.j2 according to new DSCP mapping

#### How to verify it
Check SDK dumps to make sure values are correct
Also, run sonic-mgmt test. relevant changes can be found here - sonic-net/sonic-mgmt#16440
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
mssonicbld added a commit to Azure/sonic-buildimage-msft that referenced this pull request Feb 20, 2025
#### Why I did it
To have DSCP mapping updated to Mellanox SN5600, SN5610N SKUs

#### How I did it
Update buffers_defaults_objects.j2 and qos.json.j2 according to new DSCP mapping

#### How to verify it
Check SDK dumps to make sure values are correct
Also, run sonic-mgmt test. relevant changes can be found here - sonic-net/sonic-mgmt#16440
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Collaborator

@kperumalbfn , @XuChen-MSFT Can you please help review?

@kperumalbfn
Copy link
Collaborator

@JibinBao can we combine this with #17004 (comment)? Both the PRs are related and we can use the same logic to skip the lossless tests on SKUs with only lossy configs.

@JibinBao
Copy link
Contributor Author

JibinBao commented Mar 5, 2025

@JibinBao can we combine this with #17004 (comment)? Both the PRs are related and we can use the same logic to skip the lossless tests on SKUs with only lossy configs.

hi @kperumalbfn ,

We have one same logic:
https://github.com/sonic-net/sonic-mgmt/pull/16440/files#diff-fddd5bf26ca50aa4d3504424556801116946ed4d39d1824b343e39ebcc01d4ffR2132

profile_content = dut_asic.run_redis_cmd(
argv=["redis-cli", "-n", 0, "keys", f'BUFFER_PG_TABLE:{srcport}:*-4'])
if not profile_content:
is_lossy_queue_only = True
logger.info(f"{srcport} has only lossy queue")

Can we create a function with the above code and put it tests/qos/conftest.py

def detect_lossy_only_pool(dut_asic):
is_lossy_queue_only = False
profile_content = dut_asic.run_redis_cmd(
argv=["redis-cli", "-n", 0, "keys", f'BUFFER_PG_TABLE:{srcport}:*-4'])
if not profile_content:
is_lossy_queue_only = True
logger.info(f"{srcport} has only lossy queue")
return is_lossy_queue_only

JibinBao added 4 commits March 5, 2025 10:43
1. For spc4, there is only lossy buffer, so the buffer for lossless buffer will be taken by lossy buffer, if the packet size is too small, the packet number sent to occupy the shared buffer will increase a lot, so it will lead that descriptor will be exhausted, so update testQosSaiPgSharedWatermark, testQosSaiQSharedWatermark and testQosSaiLossyQueue accordingly.
2. Remove the test config of scheduler.block_data_plane, otherwise it might rise yang validation error when do config reload
3. list the relevant Prs:
   sonic-net/sonic-buildimage#20992
   sonic-net/sonic-buildimage#21056
   sonic-net/sonic-buildimage#20991
1. When there is no lossless buffer, return a dump buffer lossless pg profile
2. Skip tests related to lossless buffer case dynamically
1. Update qos sai lossy tests because all lossless queues has changed as lossy queues
2. Skip fixture reaseAllports for mellanox device, because after qos test is finished, the teardown will do config reload, it will restore the config of ports, we don't need this fixture before running tests. Also it can save 2 minutes
3. Relevant PRs: sonic-net/sonic-buildimage#21427, sonic-net/sonic-buildimage#21056
@JibinBao JibinBao force-pushed the qos_no_pg_lossless branch from 1600d33 to 80a72f8 Compare March 5, 2025 08:48
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JibinBao
Copy link
Contributor Author

Hi @bingwang-ms, @kperumalbfn
Can you review it again?

@bingwang-ms
Copy link
Collaborator

@kperumalbfn Can you please help review?

if srcport in dualtor_ports_for_duts:
queues = "0-1"
else:
queues = "0-2"
if isMellanoxDevice(duthost):
profile_content = dut_asic.run_redis_cmd(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JibinBao Could we do similar to cable_length "0m" check instead of BUFFER_PG:4??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kperumalbfn ,
Done. Can you please review it again?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JibinBao

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik liat-grozovik merged commit 485ec3a into sonic-net:master Apr 1, 2025
14 checks passed
OriTrabelsi pushed a commit to OriTrabelsi/sonic-mgmt that referenced this pull request Apr 1, 2025
…6440)

1. For spc4 and above, there is only the lossy buffer, so the buffer for the lossless buffer will be taken by the lossy buffer. If the packet size is too small, the packet number sent to occupy the shared buffer will increase a lot, which will lead to the descriptor being exhausted, so update testQosSaiPgSharedWatermark, testQosSaiQSharedWatermark, and testQosSaiLossyQueue accordingly.
2. Remove the test config of scheduler.block_data_plane, otherwise it might raise yang validation error when do config reload
3. When there is no lossless buffer, return a dump buffer lossless pg profile, and skip tests related to lossless buffer case dynamically
4. Skip fixture reaseAllports for mellanox device, because after qos test is finished, the teardown will do config reload, it will restore the config of ports, we don't need this fixture before running tests. Also it can save 2 minutes
5. list the relevant Prs:
[Mellanox] Update buffer calculations for Mellanox-SN5600-C224O8 SKU sonic-buildimage#20992
[Mellanox] Add x86_64-nvidia_sn5610n-r0 new platform and SKUs sonic-buildimage#21056
[Mellanox] Update Mellanox-SN5600-C256S1 buffer calculations sonic-buildimage#20991
[Mellanox] Add x86_64-nvidia_sn5610n-r0 new platform and SKUs sonic-buildimage#21056
[Mellanox] Update Mellanox-SN5600-C256S1, Mellanox-SN5600-C224O8 buffers and DSCP mapping sonic-buildimage#21427
[Mellanox] Add x86_64-nvidia_sn5610n-r0 new platform and SKUs sonic-buildimage#21056
[Mellanox] Update DSCP mapping for SN5600, SN5610 SKUs sonic-buildimage#21762
@r12f
Copy link
Contributor

r12f commented Apr 7, 2025

hi @JibinBao , do you mind to help create a manual merge PR to 202412? there is a merge conflict for this PR.

1 similar comment
@r12f
Copy link
Contributor

r12f commented Apr 15, 2025

hi @JibinBao , do you mind to help create a manual merge PR to 202412? there is a merge conflict for this PR.

@JibinBao
Copy link
Contributor Author

hi @JibinBao , do you mind to help create a manual merge PR to 202412? there is a merge conflict for this PR.

Hi @r12f , I don't find the 202412 branch. Do you mean msft_202412?

@r12f
Copy link
Contributor

r12f commented Apr 18, 2025

hi Jibin, yes. the repo is here: https://github.com/Azure/sonic-mgmt.msft

@JibinBao
Copy link
Contributor Author

hi Jibin, yes. the repo is here: https://github.com/Azure/sonic-mgmt.msft

Hi @r12f Please review Azure/sonic-mgmt.msft#225

@r12f
Copy link
Contributor

r12f commented Apr 28, 2025

thanks Jibin. It is merged now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants