Skip to content

[Mellanox] run module initialization when any SFP related API is called #18930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 10, 2024

Conversation

Junchao-Mellanox
Copy link
Collaborator

@Junchao-Mellanox Junchao-Mellanox commented May 10, 2024

Why I did it

Currently, there are a few issues related to module host management:

  1. module initialization flow is triggered when chassis.get_change_event is called for the first time. It is too late which delays fast-reboot/warm-reboot convergence time
  2. module initialization flow always send change event for all SFP objects to xcvrd even if there is no real module change event. It causes ports flapping during warm-reboot
  3. legacy mode and module host management mode are mixed into the same code logic which makes it hard to maintain and extend
Work item tracking
  • Microsoft ADO (number only):

How I did it

To address above issues, the PR introduces following changes:

  1. module initialization flow is triggered when chassis SFP related API is called for the first time. It means that any of the following API will trigger module init: chassis.get_sfp, chassis.get_all_sfps, chassis.get_change_event. The init flow shall only execute once under xcvrd context.
  2. module initialization flow does not send change event anymore. Change event shall be sent only if a real cable event is detected by chassis.get_change_event (plug-in/plug-out/error)
  3. legacy mode and module host management mode are decoupled to avoid affecting each other

How to verify it

  1. Unit test covered most of the new change
  2. Full sonic-mgmt regression test make sure there is no degradation (test branch 202311)
  3. Manual test

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305
  • 202311

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox
Copy link
Collaborator Author

Hi @prgeor , could you please help review the PR?

@lguohan lguohan enabled auto-merge (squash) May 10, 2024 05:59
@lguohan lguohan disabled auto-merge May 10, 2024 05:59
@lguohan lguohan merged commit d982462 into sonic-net:master May 10, 2024
11 checks passed
@Junchao-Mellanox
Copy link
Collaborator Author

I will create cherry-pick PR soon

@Junchao-Mellanox Junchao-Mellanox deleted the master-indep branch May 10, 2024 06:28
Junchao-Mellanox added a commit to Junchao-Mellanox/sonic-buildimage that referenced this pull request May 10, 2024
…ed (sonic-net#18930)

Why I did it
Currently, there are a few issues related to module host management:

1. module initialization flow is triggered when chassis.get_change_event is called for the first time. It is too late which delays fast-reboot/warm-reboot convergence time

2. module initialization flow always send change event for all SFP objects to xcvrd even if there is no real module change event. It causes ports flapping during warm-reboot

3. legacy mode and module host management mode are mixed into the same code logic which makes it hard to maintain and extend

To address above issues, the PR introduces following changes:

1. module initialization flow is triggered when chassis SFP related API is called for the first time. It means that any of the following API will trigger module init: chassis.get_sfp, chassis.get_all_sfps, chassis.get_change_event. The init flow shall only execute once under xcvrd context.

2. module initialization flow does not send change event anymore. Change event shall be sent only if a real cable event is detected by chassis.get_change_event (plug-in/plug-out/error)

3. legacy mode and module host management mode are decoupled to avoid affecting each other

How to verify it
1. Unit test covered most of the new change
2. Full sonic-mgmt regression test make sure there is no degradation (test branch 202311)
3. Manual test
@Junchao-Mellanox
Copy link
Collaborator Author

202311 PR: #18937

yxieca pushed a commit that referenced this pull request May 15, 2024
…ed (#18930) (#18937)

Why I did it
Currently, there are a few issues related to module host management:

1. module initialization flow is triggered when chassis.get_change_event is called for the first time. It is too late which delays fast-reboot/warm-reboot convergence time

2. module initialization flow always send change event for all SFP objects to xcvrd even if there is no real module change event. It causes ports flapping during warm-reboot

3. legacy mode and module host management mode are mixed into the same code logic which makes it hard to maintain and extend

To address above issues, the PR introduces following changes:

1. module initialization flow is triggered when chassis SFP related API is called for the first time. It means that any of the following API will trigger module init: chassis.get_sfp, chassis.get_all_sfps, chassis.get_change_event. The init flow shall only execute once under xcvrd context.

2. module initialization flow does not send change event anymore. Change event shall be sent only if a real cable event is detected by chassis.get_change_event (plug-in/plug-out/error)

3. legacy mode and module host management mode are decoupled to avoid affecting each other

How to verify it
1. Unit test covered most of the new change
2. Full sonic-mgmt regression test make sure there is no degradation (test branch 202311)
3. Manual test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants