Skip to content

[Fastboot] Delay LLDP service for better fastboot performance #10568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2022
Merged

[Fastboot] Delay LLDP service for better fastboot performance #10568

merged 3 commits into from
Apr 28, 2022

Conversation

shlomibitton
Copy link
Contributor

@shlomibitton shlomibitton commented Apr 13, 2022

Signed-off-by: Shlomi Bitton [email protected]

Why I did it

Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

bootchart_no_optimizations

How I did it

  • Add a timer for LLDP service.
  • Copy the timer file to the host bin image.

How to verify it

Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@shlomibitton shlomibitton requested a review from lguohan as a code owner April 13, 2022 15:36
@liat-grozovik liat-grozovik requested review from vaibhavhd and stepanblyschak and removed request for lguohan April 13, 2022 15:48
@liat-grozovik liat-grozovik added the Request for 202111 Branch For PRs being requested for 202111 branch label Apr 13, 2022
yxieca
yxieca previously approved these changes Apr 13, 2022
[Timer]
OnUnitActiveSec=0 sec
OnBootSec=1min 30 sec
Unit=lldp.service
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lldp.service is a per-namespace container. Will that work for multi-asic? Should we have several timers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@vaibhavhd
Copy link
Contributor

I think this change will impact all kinds of reboot - warm/fast/cold/etc. Do you agree?

If yes, is that what we want to do? Can we do this only for fast and warmboot case?

@shlomibitton
Copy link
Contributor Author

I think this change will impact all kinds of reboot - warm/fast/cold/etc. Do you agree?

If yes, is that what we want to do? Can we do this only for fast and warmboot case?

@vaibhavhd I agree, I added a condition for it on syncd.sh like we do for pmon today.

if [[ x"$sonic_asic_platform" == x"mellanox" ]]; then
debug "Starting pmon service..."
/bin/systemctl start pmon
debug "Started pmon service"
fi
if [[ x"$BOOT_TYPE" != x"cold" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is correct, but it may be better to match against whitelist BOOT_TYPEs (fast, warm, fastfast).
This is so that this condition does not evaluate to true for any other boottype that gets added in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,12 @@
[Unit]
# This delay is for fast-reboot performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction - fast-reboot and warm-reboot performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vaibhavhd
vaibhavhd previously approved these changes Apr 18, 2022
Copy link
Contributor

@vaibhavhd vaibhavhd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments pending, LGTM otherwise. Please address Stepan's concerns before merging.

… for multi asic service timer.

Change the timer file to timer template file.
Allign init_cfg.json.j2 to delay LLDP.
Fix review comments.
@shlomibitton
Copy link
Contributor Author

@vaibhavhd @stepanblyschak I fixed your comments, can you please review and approve?
Thanks

@liat-grozovik
Copy link
Collaborator

@vaibhavhd @stepanblyschak kindly reminder to review

@liat-grozovik liat-grozovik merged commit 1d84e0d into sonic-net:master Apr 28, 2022
judyjoseph pushed a commit that referenced this pull request May 2, 2022
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
@qiluo-msft
Copy link
Collaborator

This commit could not be cleanly cherry-picked to 202012. Please submit another PR.

@shlomibitton
Copy link
Contributor Author

@qiluo-msft I created a separate PR for 202012:
#10744

liat-grozovik pushed a commit that referenced this pull request May 15, 2022
#10568) (#10744)

This PR is to backport a fix #10568
This PR is dependent on PR: #10745

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
@shlomibitton shlomibitton deleted the shlomi_delay_lldp_service branch May 26, 2022 12:25
liushilongbuaa pushed a commit to liushilongbuaa/sonic-buildimage that referenced this pull request Jun 20, 2022
…net#10568)

- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.

- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.

- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: sonic-net#10567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Included in 202111 Branch Request for 202111 Branch For PRs being requested for 202111 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants