Skip to content

[fast-reboot] Optimize PMON and LLDP services start up CPU consumption #10242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions dockers/docker-lldp/docker-lldp-init.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
#!/usr/bin/env bash

. /usr/bin/start.sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to make sure this won't interffer with create_switch in the future?

Copy link
Contributor Author

@shlomibitton shlomibitton Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a way to make sure this won't interfere in the future.
I think the best approach to improve the performance and ensure the stability of the solution is to delay these services in 90 seconds (fastboot criteria for control plane), this improve the performance even more.
The suggested services to delay are: dhcp_relay, radv, pmon and lldp.
In the future, as part of the HLD I am currently working, we will have a better way to indicate the end of fast-reboot flow and we can start the services according to it.
@liat-grozovik @stepanblyschak @vaibhavhd @yxieca what do you think about this approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM please wait for @vaibhavhd and @yxieca to confirm if this is ok from user pov

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about what are we proposing. This PR is preempting lldp execution (LLDP consume the CPU before create_switch is executed and not interfering it.), but we are also saying that in future PR we are going to defer the execution post fast-reboot completion.
Do we need this change right now, if we are soon going to depend upon fast-reboot end-indicator to start the deferred set of services?
I am not sure what is the impact of delaying LLDP.

Copy link
Contributor Author

@shlomibitton shlomibitton Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested solution with fast-reboot end-indicator will indeed defer the set of services to start, but this is planned for next release and for this branch we can optimize it by delaying the services.
Notice that the new infrastructure will be supported only from next release and above since there are operations required on the old image which are used with the new image, just like in warm reboot.
On this case even if the new design will be implemented on 202205, fast-reboot to an image from this release will still fall back to the previous fast-reboot implementation, since DB dump is not available etc.

Regarding delaying all 4 suggested services: dhcp_relay, radv, pmon and lldp
By my testing I didn't see any impact and no errors appeared in the logs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand the new-design in future-branch part.
Still some open questions. I agree that lldp startup sequence can be adjusted to prevent overlap with create-switch. But my doubt was - should it be done before create switch (this PR), or after create-switch (deferred start in 202205)?

Also I believe that this change will affect warm-boot process too. Do you agree? If yes, then title and description should be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to change this PR with the after create-switch approach - deferred start.
Following Stepan comment, this approach is lack of a sync mechanism to ensure it will not overlap again in the future.
The other suggestion is to delay the entire service startup until init flow has finished with a 90 seconds timer and this is much more reliable and a bit similar to the new design of 202205.
I mean, with the new implementation these services will be delayed as well, the only difference is to start them with a different mechanism and not a timer.
I agree it will affect warm-reboot and the description will be modified as well.
If this approach is acceptable I will update all as needed.

Copy link
Contributor Author

@shlomibitton shlomibitton Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaibhavhd what do you think? Should I proceed and modify the PR to this suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied back on the email regarding this. Sorry for stalling this. The new suggested approach LGTM.
However, I am not confident about impact of delaying lldp with a 90s timer. May be @prsunny or @yxieca can vet this from lldp point of view.

At this stage, I also would suggest to put two different PRs for pmon and lldp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @vaibhavhd, I will create a PR and test it.


#Generate supervisord.conf based on device metadata
mkdir -p /etc/supervisor/conf.d/
sonic-cfggen -d -t /usr/share/sonic/templates/supervisord.conf.j2 > /etc/supervisor/conf.d/supervisord.conf
Expand Down
11 changes: 0 additions & 11 deletions dockers/docker-lldp/supervisord.conf.j2
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,6 @@ stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true

[program:start]
command=/usr/bin/start.sh
priority=2
autostart=false
autorestart=false
startsecs=0
stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true
dependent_startup_wait_for=rsyslogd:running

[program:lldpd]
# https://github.com/vincentbernat/lldpd/commit/9856f2792c301116cc4a3fcfba91b9672ee5db1f
# - `-d` means to stay in foreground, log to syslog
Expand Down
8 changes: 4 additions & 4 deletions dockers/docker-platform-monitor/docker_init.j2
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ if [ -e /usr/share/sonic/platform/platform_wait ]; then
fi

# If the Python 2 sonic-platform package is not installed, try to install it
python2 -c "import sonic_platform" > /dev/null 2>&1 || pip2 show sonic-platform > /dev/null 2>&1
if [ $? -ne 0 ]; then
PYTHON2_PLATFORM_PACKAGE=/usr/local/lib/python2.7/dist-packages/sonic_platform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package can be anywhere in PYTHONPATH. Suggest to use pkgutil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the paths can change, and pkgutil should be better tool here.

Also, will the new check of path existence validate both sonic_platform and sonic-platform availability? The previous check was doing that, and I see only sonic_platform in new check.

if [ ! -d "$PYTHON2_PLATFORM_PACKAGE" ]; then
SONIC_PLATFORM_WHEEL="/usr/share/sonic/platform/sonic_platform-1.0-py2-none-any.whl"
echo "sonic-platform package not installed, attempting to install..."
if [ -e ${SONIC_PLATFORM_WHEEL} ]; then
Expand All @@ -59,8 +59,8 @@ if [ $? -ne 0 ]; then
fi

# If the Python 3 sonic-platform package is not installed, try to install it
python3 -c "import sonic_platform" > /dev/null 2>&1 || pip3 show sonic-platform > /dev/null 2>&1
if [ $? -ne 0 ]; then
PYTHON3_PLATFORM_PACKAGE=/usr/local/lib/python3.7/dist-packages/sonic_platform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python minor version might change. Suggest to use pkgutil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for Python2.

if [ ! -d "$PYTHON3_PLATFORM_PACKAGE" ]; then
SONIC_PLATFORM_WHEEL="/usr/share/sonic/platform/sonic_platform-1.0-py3-none-any.whl"
echo "sonic-platform package not installed, attempting to install..."
if [ -e ${SONIC_PLATFORM_WHEEL} ]; then
Expand Down