-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Fastboot] Delay PMON service for better fastboot performance #10567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fastboot] Delay PMON service for better fastboot performance #10567
Conversation
Signed-off-by: Shlomi Bitton <[email protected]>
files/scripts/syncd.sh
Outdated
BOOT_TYPE=`getBootType` | ||
if [[ x"$BOOT_TYPE" == x"fast" ]]; then | ||
return | ||
fi | ||
debug "Starting pmon service..." | ||
/bin/systemctl start pmon | ||
debug "Started pmon service" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added by #3505 to fix some issues related to pmon/SDK. Won't that issue happen again with your change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, notice my change is only on startup flow and not shutdown flow.
Delaying PMON is OK as long SYNCD is running before starting it.
For shutdown flow the behavior is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can be made generic to all platforms. CC @yxieca.
Also, if this is not made generic, then the comments on files/build_templates/pmon.timer
should be modified to say:
# This delay is for fast-reboot performance on MLNX platform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, don't you want to do this change for warm-reboot as well? I am curious why only for fast-reboot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a generic change, this exception is only for MLNX platform, so I added a condition for it in order to keep the delay and not start PMON with SYNCD start flow.
Regarding warm-boot, I totally agree we can do it for both types, fixed it.
files/scripts/syncd.sh
Outdated
@@ -46,6 +46,10 @@ function startplatform() { | |||
function waitplatform() { | |||
|
|||
if [[ x"$sonic_asic_platform" == x"mellanox" ]]; then | |||
BOOT_TYPE=`getBootType` | |||
if [[ x"$BOOT_TYPE" == x"fast" ]]; then | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be meaningful to have a log line here mentioning that for fastboot (or warm), pmon
start will be delayed as governed by files/build_templates/pmon.timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log added.
files/build_templates/pmon.timer
Outdated
@@ -0,0 +1,12 @@ | |||
[Unit] | |||
# This delay is for fast-reboot performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction - fast-reboot and warm-reboot performance.
files/scripts/syncd.sh
Outdated
debug "Starting pmon service..." | ||
/bin/systemctl start pmon | ||
debug "Started pmon service" | ||
if [[ x"$BOOT_TYPE" != x"cold" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as on lldp PR:
Minor: This is correct, but it may be better to match against whitelist BOOT_TYPEs (fast, warm, fastfast).
This is so that this condition does not incorrectly evaluate to true for any other boottype that gets added in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments. LGTM otherwise.
Fix review comments.
@vaibhavhd @stepanblyschak I fixed your comments, can you please review and approve? |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
@vaibhavhd kindly reminder to review |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
- Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: #10567
0b84c45
- Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: #10567
As prev approved by @vaibhavhd and merge only done to avoid checkers failures, PR is now merged. |
This commit could not be cleanly cherry-picked to 202012. Please submit another PR. |
@qiluo-msft I created a separate PR for 202012: |
- Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.
…net#10568) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: sonic-net#10567
…net#10567) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.
Signed-off-by: Shlomi Bitton [email protected]
Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.
How I did it
How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)