Skip to content

Subinterface doesn't inherit the speed of ancestors on kvm testbed. #19735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yutongzhang-microsoft opened this issue Jul 30, 2024 · 5 comments · Fixed by sonic-net/sonic-sairedis#1458 or Azure/sonic-sairedis.msft#43
Assignees
Labels
MSFT Triaged this issue has been triaged

Comments

@yutongzhang-microsoft
Copy link
Contributor

Description

In test script sub_port_interfaces/test_show_subinterface.py, we want to create the subinterface and it should inherit the ancestor's speed. But from the rest result , we can see that, the ancestor's speed is 1G, which should be inherited by the subinterface. But actually, the speed of the new created subinterface is 40G. We suspect it is an image issue on kvm testbed.

Steps to reproduce the issue:

  1. Run the script sub_port_interfaces/test_show_subinterface.py on kvm testbed.

Describe the results you received:

>           pytest_assert(status.get("speed") == config["speed"],
                          "subinterface %s should have inherited speed as %s, actual speed %s"
                          % (subintf, config["speed"], status.get("speed")))
E           Failed: subinterface Ethernet4.20 should have inherited speed as 1G, actual speed 40G

Describe the results you expected:

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@ishidawataru
Copy link
Collaborator

I noticed the same problem.

The problem is that the vslib gets the oper speed from the veth interface, which doesn't reflect the configured speed.

I'm using the following SAI_VS_OPER_SPEED_IS_CONFIGURED_SPEED option to make the oper speed equal to the configured speed.

ishidawataru/sonic-sairedis@934fe25

@yutongzhang-microsoft
Copy link
Contributor Author

Hi, @ishidawataru @kcudnik , I noticed that the script still fails after this fix PR being merged, so reopen this issue
https://elastictest.org/scheduler/testplan/67cfdaba5048655bf9cf9636?testcase=sub_port_interfaces%2Ftest_show_subinterface.py%7C%7C%7C2&type=console

mssonicbld added a commit to mssonicbld/sonic-sairedis that referenced this issue Mar 12, 2025
…_TX_READY_STATUS support

This PR adds two features to `vslib`.

- `SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED`: when `true`, `SAI_PORT_ATTR_SPEED` returns the configured speed instead of the value retrieved via [`/sys/class/net/<name>/speed`](https://github.com/sonic-net/sonic-sairedis/blob/master/vslib/SwitchStateBaseHostif.cpp#L892-L893).
  - fixes sonic-net/sonic-buildimage#19735

- `SAI_PORT_ATTR_HOST_TX_READY_STATUS`: always returns `true`. Required to support running `xcvrd` in the VS env.
  - ref: https://github.com/sonic-net/SONiC/pull/1849/files#diff-6f3e95e6c57a3edc2e30e1f13edb9fd9a32a0db44e1035ac1f0b1b9a191762a5R46
@ishidawataru
Copy link
Collaborator

@yutongzhang-microsoft Could you add SAI_VS_USE_CONFIGURED_SPEED_AS_OPER_SPEED=true to sai.profile?
This configuration is required to make the oper speed equal to the configured speed.

https://github.com/sonic-net/sonic-sairedis/pull/1458/files#diff-6231365b1d24e06bee53d061a29436d06190ac735491ee91d71a367832739af3R67

https://github.com/search?q=repo%3Asonic-net%2Fsonic-buildimage%20SAI_VS_HOSTIF_USE_TAP_DEVICE&type=code

@yutongzhang-microsoft
Copy link
Contributor Author

@ishidawataru PR raised, please help me review #22011

mssonicbld added a commit to sonic-net/sonic-sairedis that referenced this issue Mar 12, 2025
…_TX_READY_STATUS support (#1553)

This PR adds two features to `vslib`.

- `SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED`: when `true`, `SAI_PORT_ATTR_SPEED` returns the configured speed instead of the value retrieved via [`/sys/class/net/<name>/speed`](https://github.com/sonic-net/sonic-sairedis/blob/master/vslib/SwitchStateBaseHostif.cpp#L892-L893).
 - fixes sonic-net/sonic-buildimage#19735

- `SAI_PORT_ATTR_HOST_TX_READY_STATUS`: always returns `true`. Required to support running `xcvrd` in the VS env.
 - ref: https://github.com/sonic-net/SONiC/pull/1849/files#diff-6f3e95e6c57a3edc2e30e1f13edb9fd9a32a0db44e1035ac1f0b1b9a191762a5R46
mssonicbld added a commit to mssonicbld/sonic-sairedis.msft that referenced this issue Apr 16, 2025
…_TX_READY_STATUS support

This PR adds two features to `vslib`.

- `SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED`: when `true`, `SAI_PORT_ATTR_SPEED` returns the configured speed instead of the value retrieved via [`/sys/class/net/<name>/speed`](https://github.com/sonic-net/sonic-sairedis/blob/master/vslib/SwitchStateBaseHostif.cpp#L892-L893).
  - fixes sonic-net/sonic-buildimage#19735

- `SAI_PORT_ATTR_HOST_TX_READY_STATUS`: always returns `true`. Required to support running `xcvrd` in the VS env.
  - ref: https://github.com/sonic-net/SONiC/pull/1849/files#diff-6f3e95e6c57a3edc2e30e1f13edb9fd9a32a0db44e1035ac1f0b1b9a191762a5R46
r12f pushed a commit to Azure/sonic-sairedis.msft that referenced this issue Apr 16, 2025
…_TX_READY_STATUS support (#43)

This PR adds two features to `vslib`.

- `SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED`: when `true`, `SAI_PORT_ATTR_SPEED` returns the configured speed instead of the value retrieved via [`/sys/class/net/<name>/speed`](https://github.com/sonic-net/sonic-sairedis/blob/master/vslib/SwitchStateBaseHostif.cpp#L892-L893).
  - fixes sonic-net/sonic-buildimage#19735

- `SAI_PORT_ATTR_HOST_TX_READY_STATUS`: always returns `true`. Required to support running `xcvrd` in the VS env.
  - ref: https://github.com/sonic-net/SONiC/pull/1849/files#diff-6f3e95e6c57a3edc2e30e1f13edb9fd9a32a0db44e1035ac1f0b1b9a191762a5R46
r12f added a commit to Azure/sonic-sairedis.msft that referenced this issue Apr 16, 2025
* [syncd] Support bulk set in INIT_VIEW mode (#1517)

Support bulk set in INIT_VIEW mode.

* Use sonictest pool instead of sonic-common and fix arm64 issue. (#1516)

1. Use sonictest pool instead of sonic-common
2. Fix arm64 build error.

* [nvidia] Skip SAI discovery on ports (#1524)

Given that modern systems have lots of ports, performing SAI discovery takes very long time, e.g. (8 sec) for 256 port system. This has a big impact of fast-boot downtime and the discovery itself is not required for Nvidia platform fast-boot.

Same applies to Nvidia fastfast-boot (aka warm-boot), yet needs to be tested separately.

* Define bulk chunk size and bulk chunk size per counter ID (#1528)

Define bulk chunk size and bulk chunk size per counter ID.
This is to resolve the VS test failure in #1457, which is caused by loop dependency.
In PR #1457, new fields `bulk_chunk_size` and `bulk_chunk_size_per_prefix` have been introduced to `sai_redis_flex_counter_group_parameter_t` whose instances are initialized by orchagent.
However, the orchagent is still compiled with the old sairedis header, which prevents both new fields from being uninitialized which in turn fails vs test.

We have to split this PR into two:
1. #1519 which updates the header sairedis.h only. the motivation is to compile swss(orchagent) with both new fields initiated.
2. #1457 contains all the rest of code

The order to merge:
1. #1519
2. sonic-net/sonic-swss#3391
3. #1457

* [syncd] Update log level for bulk api (#1532)

[syncd] Update log level for bulk api

* [FC] Support Policer Counter (#1533)

Added the implantation for policer counter -
Support in POLICER group and sai_serialize functions
Unit Tests: Included unit tests to add and remove policer counter.

* Fix pipeline errors related to rsyslogd and libswsscommon installation (#1535)

On arm64 (and maybe sometimes amd64), rsyslogd appears to need a second or two to actually fully exit. The current code expects it to exit practically instantly. Add a sleep of 2 seconds to give it some time. Also enable some logging so that the commands being run can be seen.

Also, fix an error related to libswsscommon not getting installed due to new dependencies being added. Solve this by using apt install to install the package, which brings in any necessary dependencies.

* [syncd] Move logSet logGet under mutex to prevent race condition (#1520) (#1538)

[syncd] Move logSet logGet under mutex to prevent race condition

* Optimize counter polling interval by making it more accurate (#1457) (#1534)

What I did

Optimize the counter-polling performance in terms of polling interval accuracy

Enable bulk counter-polling to run at a smaller chunk size
There is one counter-polling thread for each counter group. All such threads can compete for the critical sections at the vendor SAI level, which means a counter-polling thread can wait for a critical section if another thread has been in it, which introduces latency for the waiting counter group.
An example is the competition between the PFC watchdog and the port counter groups.
The port counter group contains many counters and is polled in a bulk mode which takes a relatively longer time. The PFC watchdog counter group contains only a few counters but is polled at a short interval. Sometimes, PFC watchdog counters need to wait before polling, which makes the polling interval inaccurate and prevents the PFC storm from being detected in time.
To resolve this issue, we can reduce the chunk size of the port counter group. The port counter group polls the counters of all ports in a single bulk operation by default. By using a smaller chunk size, it polls the counters in several bulk operations with each polling counter of a subset (whose size <= chunk size) of all ports.
By doing so, the port counter group stays in the critical section for a shorter time and the PFC watchdog is more likely to be scheduled to poll counters and detect the PFC storm in time.

Collect the time stamp immediately after vendor SAI API returns.
Currently, many counter groups require a Lua plugin to execute based on polling interval, to calculate rates, detect certain events, etc.
Eg. For PFC watchdog counter group to PFC storm. In this case, the polling interval is calculated based on the difference of time stamps between the current and last poll to avoid deviation due to scheduling latency. However, the timestamp is collected in the Lua plugin which is several steps after the SAI API returns and is executed in a different context (redis-server). Both introduce even larger deviations. To overcome this, we collect the timestamp immediately after the SAI API returns.

* Revert "Do not enter vendor SAI critical section for counter polling/clearing operations (#1450)" (#1541)

Revert "Do not enter vendor SAI critical section for counter polling/clearing operations (#1450)"

This reverts commit 0317b16.

* [vslib] SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED, SAI_PORT_ATTR_HOST_TX_READY_STATUS support (#1553)

This PR adds two features to `vslib`.

- `SAI_KEY_VS_OPER_SPEED_IS_CONFIGURED_SPEED`: when `true`, `SAI_PORT_ATTR_SPEED` returns the configured speed instead of the value retrieved via [`/sys/class/net/<name>/speed`](https://github.com/sonic-net/sonic-sairedis/blob/master/vslib/SwitchStateBaseHostif.cpp#L892-L893).
 - fixes sonic-net/sonic-buildimage#19735

- `SAI_PORT_ATTR_HOST_TX_READY_STATUS`: always returns `true`. Required to support running `xcvrd` in the VS env.
 - ref: https://github.com/sonic-net/SONiC/pull/1849/files#diff-6f3e95e6c57a3edc2e30e1f13edb9fd9a32a0db44e1035ac1f0b1b9a191762a5R46

* Update build_and_install_module.sh to match newer Linux kernel version (#1561)

sonic-sairedis will checkout sonic-swss to do vstest but using local build_and_install_module.sh to setup test environment, which is out of date with newer Linux kernel version.
The build_and_install_module.sh in sonic-swss is up to date with latest Ubuntu 20.04, so we need to update the build sh file with the file in sonic-swss.
In a long term, we may need to do some automatically sync, but now we have some azure agent security issue need to fix immediately, so just update the build_and_install_module.sh manually.

* Revert "Optimize counter polling interval by making it more accurate (#1457) …" (#1570)

Revert "Optimize counter polling interval by making it more accurate

---------

Co-authored-by: mssonicbld <[email protected]>
Co-authored-by: Jianyue Wu <[email protected]>
Co-authored-by: Kamil Cudnik <[email protected]>
Co-authored-by: Stephen Sun <[email protected]>
Co-authored-by: Kumaresh Perumal <[email protected]>
@xwjiang-ms xwjiang-ms reopened this Apr 17, 2025
@xwjiang-ms
Copy link
Contributor

Hi @r12f, looks like issue was not fixed, PR test skip sub_port_interfaces/test_show_subinterface.py::test_subinterface_status[port] with this issue, closing this issue caused PR test blocked:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment