Skip to content

Regression: /var/log/syslog blocks receiving messages after changing priorities in files/image_config/rsyslog/rsyslog.conf.j2 RuntimeError: cannot find marker end-LogAnalyzer-container_checker_teamd #18100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eyakubch opened this issue Apr 11, 2025 · 4 comments

Comments

@eyakubch
Copy link
Contributor

eyakubch commented Apr 11, 2025

Is it platform specific

generic

Importance or Severity

Medium

Previous Working Version

6d39f70ec73aa5518bec50865a9821e72ecbb409

Steps to Reproduce

Run container_checker/test_container_checker.py In T0

Impact of this regression

in container_checker/test_container_checker.py we are expecting to find marker end-LogAnalyzer-container_checker_teamd in the end of the test but we are failing to find it in def wait_for_marker(self, marker, timeout=120, polling_interval=10) because for some reason rsyslog can take more time to send logs to remote server or it stuck and as a result /var/log/syslog stops updating for some period of time. After reverting this changes sonic-net/sonic-buildimage#21923 the test is passing.
Also there is some article how to resolve this issue for TCP connection https://www.rsyslog.com/doc/tutorials/reliable_forwarding.html
After reverting this specific changes the test passes 10\10 and if keep the changes from this PR sonic-net/sonic-buildimage#21923 the tests always fails for test_container_checker[cmono-t0-dut-None-teamd] but passes for other conrainers.

Relevant log output

invocation = {'module_args': {'_raw_params': 'python /tmp/loganalyzer.py --action add_end_marker --run_id container_checker_teamd.2025-03-29-00:21:56', '_uses_shell': False, 'warn': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}
_ansible_no_log = None
stdout =
stderr =
Traceback (most recent call last):
  File "/tmp/loganalyzer.py", line 877, in <module>
    main(sys.argv[1:])
  File "/tmp/loganalyzer.py", line 858, in main
    analyzer.place_marker(
  File "/tmp/loganalyzer.py", line 262, in place_marker
    raise RuntimeError(
RuntimeError: cannot find marker end-LogAnalyzer-container_checker_teamd.2025-03-29-00:21:56 in /var/log/syslog

Output of show version, show techsupport

Attach files (if any)

No response

@prabhataravind
Copy link
Contributor

@nazariig could you please check this asap? It appears a few tests are failing because of this issue. @qiluo-msft @zbud-msft for viz.

@rameshraghupathy
Copy link

rameshraghupathy commented Apr 11, 2025

@nazariig could you please check this asap? It appears a few tests are failing because of this issue. @qiluo-msft @zbud-msft for viz.

@nazariig @qiluo-msft @zbud-msft , Thanks for looking into it! This is a high visibility pilot gating item for smartswitch and multiple test cases in sonic-mgmt fail on the latest 202405.

@prgeor prgeor transferred this issue from sonic-net/sonic-buildimage Apr 23, 2025
@prgeor
Copy link
Contributor

prgeor commented Apr 23, 2025

@yxieca looks to be a test issue. @qiluo-msft vis.

@rameshraghupathy
Copy link

rameshraghupathy commented Apr 28, 2025

@prgeor This is seen on multiple tests and sometimes different manifestations of the same issue. In some cases rsyslogd is hogging "/var/log/syslog", especially during the logrotate corner case, and other containers are not able to get "/var/log/syslog" resulting in container crash which is a serious issue. Can you please take a second look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants