-
Notifications
You must be signed in to change notification settings - Fork 582
[orchagent][ports] Add port reference increment / decrement to lag member add / remove flows #1825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add vs tests to cover the failure case without the fix? |
Awaiting VS tests before merge |
@prsunny @zhenggen-xu I assume you are referring to me adding a VS test to the existing DPB tests? My understanding of the status of those tests is that they are unstable due a bug in the infra. I would not want to push code to master that I cannot verify that could potentially block merges in the future once this infra bug is fixed and the tests are re-enabled. I am more than happy to add tests to DPB once they are stabilized and I can ensure I am pushing functional and well-tested code. This change is a fix for a general bug in portsorch (any references to ports within this file must increment the reference counter by definition) that was coincidentally revealed by dynamic port breakout. As such I believe this pull request can be merged without having that specific test. If there is another unit test that you would prefer me contribute to in order to validate this change I am happy to do that as well. |
It is fine to add DPB LAG related tests to exercise the reference counter, or just add unit test cases to cover the reference counter for LAG member operations (the reference counter might not exposed today, so maybe we could add code to print it at debug log level so we could test during the vs tests). As for DPB test cases, we are working on the system tests stabilization due to other issues, but other tests for vlan, acl etc are stable, and you could follow that and add the tests for LAG case. if you see it differently, let me know. |
Thank you for the clarification @zhenggen-xu I have committed a dependency test similar to the vlan one. I put it in a separate file as I feel that it is sufficiently distinct from the other tests and there is also potential for tests to be added in this category in the future. I believe this test is sufficient to validate the changes I have made. @prsunny @zhenggen-xu @dgsudharsan can you please re-review at your next convenience? |
/azpw run |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
254a167
to
4a66342
Compare
@prsunny @zhenggen-xu Apologies for the commit spam here... it appears that on the Azure pipeline CI and only on that CI, this test requires I put a sleep before testing if the portchannel was successfully removed of about 30 seconds. I see no real reason it should take that long. On my local build server that operation completes fast enough that I don't need any wait. In the interest of moving forward I can simply have it wait 30 seconds which would ensure the tests passes each time. I could also remove that part of the test since it isn't meaningful in testing if the dependency functioned, its just a sanity check. If neither of these options is acceptable we will need to do some more in depth debugging about why this is behaving like it is. Another strange part of the behavior is that in order to test I have been dumping the syslog into stdout and it seems that as soon as I attempt to remove the portchannel nothing is printed out of the syslog even if I wait a while. |
@zhenggen-xu , could you please review/sign-off? |
@alexrallen , this test is consistently failing in the new build. I've temporarily marked it for skipping. Could you please investigate before we try reverting the PR? #1878 |
…mber add / remove flows (sonic-net#1825) * Added code to increment the port reference counter on the creation of a lag membership that involves that port. Also added code to decrement the counter on the removal of that lag membership as well. This prevents the port from being removed before the lag membership is.
* Validate input of add mirror session Signed-off-by: bingwang <[email protected]>
What I did
Added code to increment the port reference counter on the creation of a lag membership that involves that port. Also added code to decrement the counter on the removal of that lag membership as well. This prevents the port from being removed before the lag membership is.
Fixes sonic-net/sonic-buildimage#7782
Why I did it
In some cases (particularly in the dynamic port breakout flow) if a LAG membership was removed at the same time a port was removed the host interface would be deleted before the membership removal was processed. This would result in orchagent attempting to strip the VLAN tag from a non-existent host interface causing a crash.
How I verified it
Tested previously broken flow on a MSN2700