Skip to content

Install Superlance plugins to monitor and control process that run un… #19097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

prgeor
Copy link
Contributor

@prgeor prgeor commented May 28, 2024

…der supervisord

Why I did it

Superlance plugins are useful to monitor and control process that run under supervisord inside a docker/container

Rsyslog has a known memory leak. The supervisord plugin superlance provides means to restart rsyslogd if memory increases beyond a threshold

Work item tracking
  • Microsoft ADO (number only):

How I did it

  1. Install the plugin via pip installation in docker base
  2. Add the supervisord conf
[eventlistener:memmon]
command=memmon -p rsyslogd=100MB
events=TICK_60

How to verify it

root@str-7060X6-D10-U32:/# pip install --proxy=http://10.201.148.40:8080 superlance                      
Collecting superlance
  Obtaining dependency information for superlance from https://files.pythonhosted.org/packages/41/4f/69772f821ea6082c92059fd24463cbd7ff1aff27cf14d05a810415756f2e/superlance-2.0.0-py2.py3-none-any.whl.metadata
  Downloading superlance-2.0.0-py2.py3-none-any.whl.metadata (6.4 kB)
Requirement already satisfied: supervisor in /usr/local/lib/python3.9/dist-packages (from superlance) (4.2.1)
Downloading superlance-2.0.0-py2.py3-none-any.whl (49 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.6/49.6 kB 877.7 kB/s eta 0:00:00
Installing collected packages: superlance
Successfully installed superlance-2.0.0

root@str-7060X6-D10-U32:/# memmon 
memmon.py [-c] [-p processname=byte_size] [-g groupname=byte_size]
          [-a byte_size] [-s sendmail] [-m email_address]
          [-u uptime] [-n memmon_name]


Added following supervisor configuration and ensured the memmon restarts the rsyslogd if memory usage is above the 100KB threshold

[eventlistener:memmon]
command=memmon -p rsyslogd=100KB
events=TICK_60

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@prgeor prgeor requested a review from saiarcot895 May 28, 2024 07:43
@prgeor prgeor marked this pull request as ready for review May 28, 2024 08:20
@prgeor prgeor requested review from qiluo-msft and lguohan as code owners May 28, 2024 08:20
@@ -64,6 +64,7 @@ RUN pip install wheel

# Install supervisor
RUN pip install supervisor>=3.4.0
RUN pip install superlance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use docker-base-bullseye and docker-base-bookworm. docker-base isn't used anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 thanks .removed

@saiarcot895
Copy link
Contributor

The PR description says 100KB is the limit, but the actual change sets it to 100MB. Can you update the PR description?

@saiarcot895
Copy link
Contributor

Has the restart logic been tested?

@prgeor
Copy link
Contributor Author

prgeor commented May 29, 2024

Has the restart logic been tested?

manually installed the superlance package inside PMON and then it was tested. I am awaiting the build to be generated to test with latest image

@StormLiangMS
Copy link
Contributor

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan
Copy link
Collaborator

lguohan commented Jun 6, 2024

need more thorough testing. like, what is rsyslogd consistent reaching the 100MB, is there any restart limit? what is the syslog messages, if there many rsyslogd restart due to this, do we have alert?

does this impact boot time, if yes, how many?

there are many other dockers having rsyslogd, why only these dockers?

@prgeor
Copy link
Contributor Author

prgeor commented Jul 2, 2024

need more thorough testing. like, what is rsyslogd consistent reaching the 100MB, is there any restart limit? what is the syslog messages, if there many rsyslogd restart due to this, do we have alert?

does this impact boot time, if yes, how many?

there are many other dockers having rsyslogd, why only these dockers?

@lguohan
There is currently no restart limit for rsyslog

need more thorough testing. like, what is rsyslogd consistent reaching the 100MB, is there any restart limit? what is the syslog messages, if there many rsyslogd restart due to this, do we have alert?

does this impact boot time, if yes, how many?

there are many other dockers having rsyslogd, why only these dockers?

@lguohan Typical memory usage of rsylogd is few MBs and I selected 100M which is not too small (in which case on a chatty device there will be frequent rsyslogd restarts in short period of time) and not too large threshold (in which case monit may complain high memory usage on platforms having less RAM memory.)

We will see rsyslogd daemon restart message in supervisord log message similar to any docker container process restart by supervisord. We can have alerting to alert on these message signatures.

@DavidZagury
Copy link
Contributor

@prgeor as I see it, 100MB is too much, this represent of about 2000% memory usage of normal rsyslog consumption. If we only look on one docker, then 100MB might not seem much, but looking on the total amount of used memory in the switch, if it has gotten close to this state in multiple docker this can start to accumulate to a significant memory usage wasted.

@AharonMalkin
Copy link

AharonMalkin commented Jan 16, 2025

Hey @prgeor ,. we intergrated this PR and see that in some containers the fix is missing:
f.e eventd, bgp, host.
Can you please verify the fix exists on all containers?

Thanks

@prgeor
Copy link
Contributor Author

prgeor commented Jan 21, 2025

@prgeor as I see it, 100MB is too much, this represent of about 2000% memory usage of normal rsyslog consumption. If we only look on one docker, then 100MB might not seem much, but looking on the total amount of used memory in the switch, if it has gotten close to this state in multiple docker this can start to accumulate to a significant memory usage wasted.

@DavidZagury what is the normal limit in your system? What should the limit you would like?

@@ -28,6 +28,10 @@ stdout_logfile=syslog
stderr_logfile=syslog
dependent_startup=true

[eventlistener:memmon]
command=memmon -p rsyslogd=100MB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to capture the thresholds in a config file?

@gechiang
Copy link
Collaborator

@prgeor , any update on the changes and next step?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants