Skip to content

Add syslog forwarding rules validation and thread exception handling #17572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 31, 2025

Conversation

ytzur1
Copy link
Contributor

@ytzur1 ytzur1 commented Mar 18, 2025

Add validation of rsyslog message from conf file like telemetry, bgp, gnmi, etc.
Implement thread exception handler for sub thread error capture

Description of PR

  1. Adding validation of rsyslog messages from configuration files (telemetry, bgp, gnmi, etc.)
  2. Implementing a thread exception handler for capturing subthread errors

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
  • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

The current syslog testing framework lacks validation of rsyslog messages from various configuration files. Additionally, exception during subthreads were not being caught by the main thread.

How did you do it?

  1. Added rsyslog message validation:

    • Implemented verification of logger messages from telemetry, bgp, gnmi,
  2. Implemented thread-safe exception handling:

    • Created a global thread_exceptions list for capturing errors
    • Added ThreadWrapper class for safe exception capture
    • Implemented handle_thread_exceptions() for error reporting in main thread

How did you verify/test it?

Ran all test cases in test_syslog_source_ip.py

Any platform specific information?

No

Supported testbed topology if it's a new test case?

No

Documentation

Add validation of rsyslog message from conf file like telemetry, bgp, gnmi, etc.
Implement thread-safe exception handler for subthread error capture
Enhance logging message generation for comprehensive testing

Tested: Ran all test cases in test_syslog_source_ip.py
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft requested review from liuh-80 and zbud-msft March 18, 2025 19:25
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ytzur1 ytzur1 force-pushed the syslog_test_case branch from 537deea to ec5e1e8 Compare March 19, 2025 08:37
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Collaborator

Related image change sonic-net/sonic-buildimage#21923

time.sleep(0.2)
for flag, msg in logging_data:
for i in range(syslogUtilsConst.PACKETS_NUM):
dut.shell(f"logger {default_priority} {flag} {msg} {i + 1}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command run in host.

However, the config causes this issue are:
'''
if re_match($programname, "bgp[0-9]*#(frr|zebra|staticd|watchfrr)") then {
/var/log/frr/zebra.log
stop
}
'''

Which means when reproduce the issue, the log send inside bgp docker container, then the 'programname' part can match.

I warry about this test case can't prevent regression. can you test with an image have this issue to make sure this test can catch issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good suggestion: test with an image have this issue to make sure this test can catch issue

Copy link
Contributor Author

@ytzur1 ytzur1 Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with an image that had the configuration issue, and the test failed and reporting that not all types of forwarded messages were successful.

@zbud-msft
Copy link
Contributor

Hi @ytzur1, I manually ran your test changes from our test server and seeing following issues when running the testcase. Can you please double check your logic in capture_syslog_packets() function in syslog/syslog_utils.py

25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_Vrf-data.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_Vrf-data.pcap"}}, "_ansible_no_log": false, "changed": false}
25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_default.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_default.pcap"}}, "_ansible_no_log": false, "changed": false}
25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_mgmt.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_mgmt.pcap"}}, "_ansible_no_log": false, "changed": false}

Copy link
Contributor

@zbud-msft zbud-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check logic for capture_syslog_packets, seems to run into "file not found: /tmp/X.pcap" issue

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ytzur1 ytzur1 force-pushed the syslog_test_case branch from 501acf4 to e1ed921 Compare March 26, 2025 10:58
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ytzur1
Copy link
Contributor Author

ytzur1 commented Mar 26, 2025

Hi, @zbud-msft
The error was caused by a missing tab in tcpdump cmd command after pre-commit formatting of long lines.
It's fixed and latest version was tested and it's working.

Hi @ytzur1, I manually ran your test changes from our test server and seeing following issues when running the testcase. Can you please double check your logic in capture_syslog_packets() function in syslog/syslog_utils.py

25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_Vrf-data.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_Vrf-data.pcap"}}, "_ansible_no_log": false, "changed": false} 25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_default.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_default.pcap"}}, "_ansible_no_log": false, "changed": false} 25/03/2025 23:13:49 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt-int/tests/syslog/syslog_utils.py::capture_syslog_packets#157: AnsibleModule::fetch Result => {"failed": true, "msg": "file not found: /tmp/test_syslog_tcpdump_mgmt.pcap", "invocation": {"module_args": {"src": "/tmp/test_syslog_tcpdump_mgmt.pcap"}}, "_ansible_no_log": false, "changed": false}

@zbud-msft
Copy link
Contributor

@ytzur1 Would it be possible to post screenshot of testcases passing since kvm tests will skip?

default_priority = '--priority CRIT'
for flag, msg in logging_data:
for i in range(syslogUtilsConst.PACKETS_NUM):
dut.shell(f"logger {default_priority} {flag} {msg} {i + 1}")
Copy link
Contributor

@qiluo-msft qiluo-msft Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger

Could you also add testcase for this repro inside container:

admin@sonic:~$ docker exec -it bgp bash 
root@sonic:/# logger -p local0.notice -t bgpd TESTMESSAGE20

@qiluo-msft
Copy link
Contributor

import pytest

Could ensure it is run in KVM test? so it will protect each PR.


Refers to: tests/syslog/test_syslog_source_ip.py:2 in e1ed921. [](commit_id = e1ed921, deletion_comment = False)

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The active comments are non-blocking, susgest follow up with new PR. Merge to help enhance qualification pipeline to qualify recent image fix.

@qiluo-msft qiluo-msft merged commit ba794a1 into sonic-net:master Mar 31, 2025
13 checks passed
@ytzur1
Copy link
Contributor Author

ytzur1 commented Apr 1, 2025

@qiluo-msft
Screenshot 2025-04-01 at 9 16 32
It seems the test_syslog_source_ip is included in kvm test but its being skipped

@ytzur1
Copy link
Contributor Author

ytzur1 commented Apr 1, 2025

@ytzur1 Would it be possible to post screenshot of testcases passing since kvm tests will skip?

Screenshot 2025-04-01 at 9 59 54

OriTrabelsi pushed a commit to OriTrabelsi/sonic-mgmt that referenced this pull request Apr 1, 2025
…onic-net#17572)

Add validation of rsyslog message from conf file like telemetry, bgp, gnmi, etc.
Implement thread exception handler for sub thread error capture

Description of PR
Adding validation of rsyslog messages from configuration files (telemetry, bgp, gnmi, etc.)
Implementing a thread exception handler for capturing subthread errors

Approach
What is the motivation for this PR?
The current syslog testing framework lacks validation of rsyslog messages from various configuration files. Additionally, exception during subthreads were not being caught by the main thread.

How did you do it?
Added rsyslog message validation:

Implemented verification of logger messages from telemetry, bgp, gnmi,
Implemented thread-safe exception handling:

Created a global thread_exceptions list for capturing errors
Added ThreadWrapper class for safe exception capture
Implemented handle_thread_exceptions() for error reporting in main thread
How did you verify/test it?
Ran all test cases in test_syslog_source_ip.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants