You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Syslog messages from orchagent will sometimes be rate-limited for no apparent reason. The rate-limiting threshhold is set at 20k messages sent in 5 minutes, however syslog will sometimes start rate-limiting messages from orchagent when fewer than 20k messages have been logged in the past 5 minutes.
I believe this occurs because lower priority messages which are not written to syslog are still being counted towards the rate-limit thresshold. Steps to reproduce the issue:
Ensure that the orchagent loglevel is set to NOTICE
Run a sonic-mgmt test module that generates lots of orchagent activity, such as the ACL test suite
Describe the results you received:
In the syslog, there are messages indicating that rate-limiting is occurring: Oct 19 12:19:08.599970 str2-7050cx3-acs-03 INFO swss#rsyslogd: imuxsock[pid: 109, name: /usr/bin/orchagent] from <str2-7050cx3-acs-03:orchagent>: begin to drop messages due to rate-limiting.
However, there are fewer than 20k messages sent in the five minutes before rate-limiting began.
If the orchagent log level is lowered to INFO and the test is repeated, then enough messages are written to the syslog to trigger rate-limiting.
Describe the results you expected:
No rate-limiting should occur
Additional information you deem important (e.g. issue happens only occasionally):
Turns out this is a non-issue. #5666 should be reverted. It only appears that there aren't enough messages to justify rate limiting because of the way rsyslog handles repeated messages. Repeated messages are usually handled as follows:
Some message A is sent to rsyslog once.
Message A is written to the log file.
The number of messages that count towards the rate-limit threshhold is 1.
n copies of message A are sent to rsyslog. rsyslog does not write these copies to the log file but instead records the message and counts the number of duplicates.
The number of messages that count towards the rate-limit threshhold is n + 1. (Note that this increase happens gradually, as each duplicate is received.)
Some message B (that is different from A) is sent to rsyslog.
rsyslog will write a message repeated n times log to the log file (indicating the number of duplicates of message A received), and will now count all duplicates of message A as only a single message.
The number of messages that count towards the rate-limit threshhold is 2.
Then rsyslog will write B to the log file.
The number of messages that count towards the rate-limit threshhold is 3.
The log file will look like this:
A
message repeated n times: [ A]
B
However, if in step 3 the rate-limit threshhold is exceeded, rate-limiting is triggered and all duplicate messages are dropped. If this happens, no message repeated n times log will be generated. This can give the appearance that rate-limiting has been triggered without a sufficient number of messages. In reality, the messages that crossed the rate-limit threshhold are dropped as a result of the rate limiting.
Description
Revert #5666 once a permanent fix is implemented.
Syslog messages from orchagent will sometimes be rate-limited for no apparent reason. The rate-limiting threshhold is set at 20k messages sent in 5 minutes, however syslog will sometimes start rate-limiting messages from orchagent when fewer than 20k messages have been logged in the past 5 minutes.
I believe this occurs because lower priority messages which are not written to syslog are still being counted towards the rate-limit thresshold.
Steps to reproduce the issue:
Describe the results you received:
In the syslog, there are messages indicating that rate-limiting is occurring:
Oct 19 12:19:08.599970 str2-7050cx3-acs-03 INFO swss#rsyslogd: imuxsock[pid: 109, name: /usr/bin/orchagent] from <str2-7050cx3-acs-03:orchagent>: begin to drop messages due to rate-limiting
.However, there are fewer than 20k messages sent in the five minutes before rate-limiting began.
If the orchagent log level is lowered to INFO and the test is repeated, then enough messages are written to the syslog to trigger rate-limiting.
Describe the results you expected:
No rate-limiting should occur
Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: