Skip to content

[orchagent][ports] Add port reference increment / decrement to lag member add / remove flows #1825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 11, 2021

Conversation

alexrallen
Copy link
Contributor

What I did

Added code to increment the port reference counter on the creation of a lag membership that involves that port. Also added code to decrement the counter on the removal of that lag membership as well. This prevents the port from being removed before the lag membership is.

Fixes sonic-net/sonic-buildimage#7782

Why I did it

In some cases (particularly in the dynamic port breakout flow) if a LAG membership was removed at the same time a port was removed the host interface would be deleted before the membership removal was processed. This would result in orchagent attempting to strip the VLAN tag from a non-existent host interface causing a crash.

How I verified it

Tested previously broken flow on a MSN2700

root@r-panther-23:/home/admin# config portchannel add PortChannel0004
root@r-panther-23:/home/admin# config portchannel member add PortChannel0004 Ethernet16
root@r-panther-23:/home/admin# config vlan add 2147
root@r-panther-23:/home/admin# config vlan member add --untagged 2147 PortChannel0004
root@r-panther-23:/home/admin# config int ip add Vlan2147 10.0.0.2/24
root@r-panther-23:/home/admin# config int break Ethernet16 2x50G[40G,25G,10G] -f -y

Running Breakout Mode : 1x100G[50G,40G,25G,10G]
Target Breakout Mode : 2x50G[40G,25G,10G]

Ports to be deleted :
 {
    "Ethernet16": "100000"
}
Ports to be added :
 {
    "Ethernet16": "50000",
    "Ethernet18": "50000"
}

After running Logic to limit the impact

Final list of ports to be deleted :
 {
    "Ethernet16": "100000"
}
Final list of ports to be added :
 {
    "Ethernet16": "50000",
    "Ethernet18": "50000"
}
Note: Below table(s) have no YANG models:
FEATURE, KDUMP, SNMP, SNMP_COMMUNITY, XCVRD_LOG,
Below Config can not be verified, It may cause harm to the system
 {}
Do you wish to Continue? [y/N]: y
Breakout process got successfully completed.
Please note loaded setting will be lost after system reboot. To preserve setting, run `config save`.
root@r-panther-23:/home/admin#

@alexrallen alexrallen requested a review from prsunny as a code owner July 17, 2021 02:47
dgsudharsan
dgsudharsan previously approved these changes Jul 17, 2021
prsunny
prsunny previously approved these changes Jul 20, 2021
@zhenggen-xu
Copy link
Collaborator

Can you add vs tests to cover the failure case without the fix?

@prsunny
Copy link
Collaborator

prsunny commented Jul 20, 2021

Can you add vs tests to cover the failure case without the fix?

Awaiting VS tests before merge

@alexrallen
Copy link
Contributor Author

@prsunny @zhenggen-xu I assume you are referring to me adding a VS test to the existing DPB tests?

My understanding of the status of those tests is that they are unstable due a bug in the infra. I would not want to push code to master that I cannot verify that could potentially block merges in the future once this infra bug is fixed and the tests are re-enabled.

I am more than happy to add tests to DPB once they are stabilized and I can ensure I am pushing functional and well-tested code.

This change is a fix for a general bug in portsorch (any references to ports within this file must increment the reference counter by definition) that was coincidentally revealed by dynamic port breakout. As such I believe this pull request can be merged without having that specific test. If there is another unit test that you would prefer me contribute to in order to validate this change I am happy to do that as well.

@zhenggen-xu
Copy link
Collaborator

@prsunny @zhenggen-xu I assume you are referring to me adding a VS test to the existing DPB tests?

My understanding of the status of those tests is that they are unstable due a bug in the infra. I would not want to push code to master that I cannot verify that could potentially block merges in the future once this infra bug is fixed and the tests are re-enabled.

I am more than happy to add tests to DPB once they are stabilized and I can ensure I am pushing functional and well-tested code.

This change is a fix for a general bug in portsorch (any references to ports within this file must increment the reference counter by definition) that was coincidentally revealed by dynamic port breakout. As such I believe this pull request can be merged without having that specific test. If there is another unit test that you would prefer me contribute to in order to validate this change I am happy to do that as well.

It is fine to add DPB LAG related tests to exercise the reference counter, or just add unit test cases to cover the reference counter for LAG member operations (the reference counter might not exposed today, so maybe we could add code to print it at debug log level so we could test during the vs tests).

As for DPB test cases, we are working on the system tests stabilization due to other issues, but other tests for vlan, acl etc are stable, and you could follow that and add the tests for LAG case. if you see it differently, let me know.

@alexrallen alexrallen dismissed stale reviews from prsunny and dgsudharsan via a1f7a91 July 21, 2021 22:32
@alexrallen
Copy link
Contributor Author

Thank you for the clarification @zhenggen-xu

I have committed a dependency test similar to the vlan one. I put it in a separate file as I feel that it is sufficiently distinct from the other tests and there is also potential for tests to be added in this category in the future.

I believe this test is sufficient to validate the changes I have made.

@prsunny @zhenggen-xu @dgsudharsan can you please re-review at your next convenience?

dgsudharsan
dgsudharsan previously approved these changes Jul 21, 2021
@alexrallen
Copy link
Contributor Author

/azpw run

@prsunny
Copy link
Collaborator

prsunny commented Jul 25, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alexrallen
Copy link
Contributor Author

@prsunny @zhenggen-xu Apologies for the commit spam here... it appears that on the Azure pipeline CI and only on that CI, this test requires I put a sleep before testing if the portchannel was successfully removed of about 30 seconds. I see no real reason it should take that long. On my local build server that operation completes fast enough that I don't need any wait.

In the interest of moving forward I can simply have it wait 30 seconds which would ensure the tests passes each time. I could also remove that part of the test since it isn't meaningful in testing if the dependency functioned, its just a sanity check.

If neither of these options is acceptable we will need to do some more in depth debugging about why this is behaving like it is.

Another strange part of the behavior is that in order to test I have been dumping the syslog into stdout and it seems that as soon as I attempt to remove the portchannel nothing is printed out of the syslog even if I wait a while.

@prsunny
Copy link
Collaborator

prsunny commented Aug 10, 2021

@zhenggen-xu , could you please review/sign-off?

@prsunny prsunny merged commit 7280e19 into sonic-net:master Aug 11, 2021
@prsunny
Copy link
Collaborator

prsunny commented Aug 17, 2021

@alexrallen , this test is consistently failing in the new build. I've temporarily marked it for skipping. Could you please investigate before we try reverting the PR? #1878

raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this pull request Oct 5, 2021
…mber add / remove flows (sonic-net#1825)

* Added code to increment the port reference counter on the creation of a lag membership that involves that port. Also added code to decrement the counter on the removal of that lag membership as well. This prevents the port from being removed before the lag membership is.
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
* Validate input of add mirror session

Signed-off-by: bingwang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Functional] [DPB] | port with dependencies is unmapped after breakout with force
4 participants