-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Chassis] sonic-mgmt PC suite tests test_voq_po_update and test_po_update_io_no_loss are failing #19357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@arlakshm for your viz |
@saksarav-nokia @arlakshm There was a PR recently to update the LAG operstatus when addLag is done, will this help here? |
Hi @saksarav-nokia, regarding are below comment. If the lag is deleted without removing the ip interface on the local card. It is expected the remove lag will fail. this is existing design; it has nothing to do changes in sonic-net/sonic-swss#3150. Please let me know if I am missing something? |
@arlakshm , The test test_voq_po_update doesn't have ip address on the PortChannel999. It creates empty lag and verifies in CHASSIS_DB and ASIC_DB and removes the lag. The test was passing few week ago and started failing. |
@arlakshm , With the following code commented out, the tests are passing
|
… is no rif assciated with the port (sonic-net#3207) What I did Fixes: sonic-net/sonic-buildimage#19357 Why I did it In the sonic-mgmt pc test suite. When an empty lag is created. The portchannel changed is sync'ed to the remote LC even if there portchannel has no route interface created. This results in a dummy route interface created on the remote LC. So when the empty port channel is removed on the local card, the removal fails in the remote LC because of the dummy route interface. Add a fix to sync the portchannel interface state to the remote LC only when there routeinterface is created on the local LC.
… is no rif assciated with the port (#3207) What I did Fixes: sonic-net/sonic-buildimage#19357 Why I did it In the sonic-mgmt pc test suite. When an empty lag is created. The portchannel changed is sync'ed to the remote LC even if there portchannel has no route interface created. This results in a dummy route interface created on the remote LC. So when the empty port channel is removed on the local card, the removal fails in the remote LC because of the dummy route interface. Add a fix to sync the portchannel interface state to the remote LC only when there routeinterface is created on the local LC.
… is no rif assciated with the port (sonic-net#3207) What I did Fixes: sonic-net/sonic-buildimage#19357 Why I did it In the sonic-mgmt pc test suite. When an empty lag is created. The portchannel changed is sync'ed to the remote LC even if there portchannel has no route interface created. This results in a dummy route interface created on the remote LC. So when the empty port channel is removed on the local card, the removal fails in the remote LC because of the dummy route interface. Add a fix to sync the portchannel interface state to the remote LC only when there routeinterface is created on the local LC.
Description
In latest master, the Port Channel test cases test_voq_po_update and test_po_update_io_no_loss are failing and looks like it is due to the changes made in sonic-net/sonic-swss#3150.
When the empty LAG is created in one IMM, the oper status of the Lag is changed from unknown to down as shown in the logs below. When this oper state change notification is processed, the voqSyncIntfState is called which writes to the SYSTEM_INTERFACE in chassis_db and the other asics in the same IMM and other IMM's receives SYSTEM_LAG_TABLE & SYSTEM_INTERFACE notifications from chassis_db as shown below. The Router interface is created in remote asics and reference to the Lag.
But when the LAG is deleted, the local asic does not delete the SYSTEM_INTERFACE and only deletes the SYSTEM_LAG_TABLE in chassis_db, so when the remote asics receives the SYSTEM_LAG_TABLE delete, it calls removeLag and since RouterInterface is referencing this lag, the delete fails and the test cases fail.
Does PortsOrch::updatePortOperStatusneed need to call gIntfsOrch->voqSyncIntfStat when the oper state change from unknown to down?
2024 Jun 19 16:54:19.889094 ixre-egl-board7 NOTICE swss0#orchagent: :- addLag: Create an empty LAG PortChannel999 lid:2000000000c55
2024 Jun 19 16:54:19.889589 ixre-egl-board7 NOTICE swss0#orchagent: :- updatePortOperStatus: Port PortChannel999 oper state set from unknown to down
2024 Jun 19 16:54:19.889589 ixre-egl-board7 NOTICE swss0#orchagent: :- voqSyncIntfState: Syncing system interface state down for port ixre-egl-board7|asic0|PortChannel999
2024 Jun 19 16:54:19.891401 ixre-egl-board7 NOTICE swss1#orchagent: :- addLag: Create an empty LAG ixre-egl-board7|asic0|PortChannel999 lid:102000000000c0b
2024 Jun 19 16:54:19.897058 ixre-egl-board7 INFO kernel: [ 5084.526940] PortChannel999: Mode changed to "loadbalance"
2024 Jun 19 16:54:19.898792 ixre-egl-board7 NOTICE swss1#orchagent: :- addRouterIntfs: Create router interface ixre-egl-board7|asic0|PortChannel999 MTU 1492
2024 Jun 19 16:54:19.901126 ixre-egl-board7 NOTICE teamd0#teammgrd: :- addLag: Start port channel PortChannel999 with teamd
2024 Jun 19 16:54:19.904120 ixre-egl-board7 NOTICE swss0#portsyncd: :- onMsg: nlmsg type:16 key:PortChannel999 admin:1 oper:0 addr:40:7c:7d:bb:25:9d ifindex:62 master:0 type:team
2024 Jun 19 16:54:19.904969 ixre-egl-board7 NOTICE teamd0#teammgrd: :- setLagAdminStatus: Set port channel PortChannel999 admin status to up
2024 Jun 19 16:54:19.905031 ixre-egl-board7 INFO kernel: [ 5084.534338] 8021q: adding VLAN 0 to HW filter on device PortChannel999
2024 Jun 19 16:54:19.929936 ixre-egl-board7 NOTICE teamd0#teammgrd: :- setLagMtu: Set port channel PortChannel999 MTU to 9100
2024 Jun 19 16:54:19.930240 ixre-egl-board7 NOTICE teamd0#tlm_teamd: :- try_add_lag: The LAG 'PortChannel999' has been added.
2024 Jun 19 16:54:19.945668 ixre-egl-board7 NOTICE swss0#orchagent: :- updatePortOperStatus: Port PortChannel999 oper state set from down to down
2024 Jun 19 16:54:20.204105 ixre-egl-board7 NOTICE syncd1#syncd: :- addObject: Rif Counter oid:0x15100600003015 does not has supported counters
^C(1057.04s)
Steps to reproduce the issue:
Describe the results you received:
The tests should pass
Describe the results you expected:
The tests are failing
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: