Revert "Add watchdog mechanism to swss service and generate alert when swss have issue." #15390
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reverts #14686
process_monitoring/test_critical_process_monitoring.py failed because there is no file ‘/etc/supervisor/critical_processes’ in the bgp and other containers, so the ‘supervisor-proc-exit-listener’ can’t up:
In the bgp container:
root@vlab-01:/# supervisorctl status
bgpcfgd RUNNING pid 52, uptime 0:12:13
bgpd RUNNING pid 50, uptime 0:12:13
bgpmon RUNNING pid 58, uptime 0:12:13
containercfgd RUNNING pid 36, uptime 0:12:14
dependent-startup EXITED Jun 08 10:22 AM
fpmsyncd RUNNING pid 59, uptime 0:12:13
rsyslogd RUNNING pid 31, uptime 0:12:15
staticd RUNNING pid 49, uptime 0:12:13
staticroutebfd RUNNING pid 60, uptime 0:12:13
supervisor-proc-exit-listener FATAL Exited too quickly (process log may have details)
zebra RUNNING pid 35, uptime 0:12:14
zsocket EXITED Jun 08 10:22 AM
root@vlab-01:/#
root@vlab-01:/# /usr/bin/supervisor-proc-exit-listener --container-name bgp
Traceback (most recent call last):
File "/usr/bin/supervisor-proc-exit-listener", line 212, in
main(sys.argv[1:])
File "/usr/bin/supervisor-proc-exit-listener", line 135, in main
_, watch_process_list = get_group_and_process_list(WATCH_PROCESSES_FILE)
File "/usr/bin/supervisor-proc-exit-listener", line 51, in get_group_and_process_list
with open(process_file, 'r') as file:
FileNotFoundError: [Errno 2] No such file or directory: '/etc/supervisor/watchdog_processes'
root@vlab-01:/#
The watchdog_processes is introduced by: Add watchdog mechanism to swss service and generate alert when swss have issue. by liuh-80 · Pull Request #14686 · sonic-net/sonic-buildimage (github.com)
But a weird thing is, the test still failed on my try-revert pr: Revert "Add watchdog mechanism to swss service and generate alert when swss have issue." by yejianquan · Pull Request #15390 · sonic-net/sonic-buildimage (github.com)
I debugged in the image built by the revert PR, the ‘supervisor-proc-exit-listener’ is still old that wants to load ‘/etc/supervisor/watchdog_processes'.
I’m pretty sure the failure is caused the watchdog_processes, if I manually comment related code in the kvm testbed, the ‘supervisor-proc-exit-listener’ can up as expected.