Skip to content

Enable FDB learning event after all ports removed from default 1Q bridge #3630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bingwang-ms
Copy link
Contributor

@bingwang-ms bingwang-ms commented May 1, 2025

What I did

This PR is to fix an orchagent crash issue. Error logs are as below.

2025 May  1 05:51:07.128331 str4-7060x6-64pe-7 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x3a0000000000d7 reference count is 1, can't remove
2025 May  1 05:51:07.128331 str4-7060x6-64pe-7 ERR swss#orchagent: :- removeDefaultBridgePorts: Failed to remove bridge port, rv:-17
2025 May  1 05:51:07.128566 str4-7060x6-64pe-7 INFO swss#supervisord: orchagent terminate called after throwing an instance of 'std::runtime_error'
2025 May  1 05:51:07.128566 str4-7060x6-64pe-7 INFO swss#supervisord: orchagent   what():  PortsOrch initialization failure
2025 May  1 05:51:07.815330 str4-7060x6-64pe-7 INFO swss#supervisord 2025-05-01 05:51:07,814 WARN exited: orchagent (terminated by SIGABRT (core dumped); not expected)

It's because FDB is learnt on the default bridge, which increased reference count for bridge port and caused port removal failure.

The issue is addressed by not setting SAI_SWITCH_ATTR_FDB_EVENT_NOTIFY when creating switch, and enable it after all ports being removed from default bridge.

Why I did it
This PR is to fix an orchagent crash issue.

How I verified it

  1. The change is verified on multiple platforms. FDB learning can be done normally after this change.

Broadcom

admin@str4-7060x6-64pe-fan-4:~$ fdbshow 
  No.    Vlan  MacAddress         Port         Type
-----  ------  -----------------  -----------  -------
    1    1234  B6:2C:7E:FC:80:00  Ethernet496  Dynamic
    2    1234  D6:5E:2C:C0:B8:0B  Ethernet496  Dynamic
    3    1235  CE:8F:2A:A1:00:01  Ethernet496  Dynamic

Mellanox

admin@str-msn2700-01:~$ fdbshow 
  No.    Vlan  MacAddress         Port       Type
-----  ------  -----------------  ---------  -------
    1    1000  7C:FE:90:5E:60:01  Ethernet4  Dynamic
Total number of entries 1

Cisco

admin@str3-8101-03:~$ fdbshow 
  No.    Vlan  MacAddress         Port         Type
-----  ------  -----------------  -----------  -------
    1    1000  9C:09:8B:B6:E6:00  Ethernet240  Dynamic
Total number of entries 1
  1. The existing VS test test_fdb.py can cover the change.

Details if related

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Contributor Author

included in 202503 by PR Azure/sonic-swss.msft#76

@r12f
Copy link

r12f commented May 2, 2025

@sdszhang we might need this change as well to fix the bridge port issue. I will port this change to 202412 as well, if this fixes the fdb notification issue on 202503.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants