You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#### Why I did it
To fix errors that happen when writing to the queue:
```
Jun 5 23:04:41.798613 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun 5 23:04:41.798985 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun 5 23:04:41.799535 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun 5 23:04:41.806010 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
Jun 5 23:04:41.814075 r-leopard-56 ERR healthd: system_service[Errno 104] Connection reset by peer
Jun 5 23:04:41.824135 r-leopard-56 ERR healthd: Traceback (most recent call last):#12 File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 484, in system_service#012 msg = self.myQ.get(timeout=QUEUE_TIMEOUT)#12 File "<string>", line 2, in get#012 File "/usr/lib/python3.9/multiprocessing/managers.py", line 809, in _callmethod#012 kind, result = conn.recv()#12 File "/usr/lib/python3.9/multiprocessing/connection.py", line 255, in recv#012 buf = self._recv_bytes()#12 File "/usr/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes#012 buf = self._recv(4)#12 File "/usr/lib/python3.9/multiprocessing/connection.py", line 384, in _recv#012 chunk = read(handle, remaining)#012ConnectionResetError: [Errno 104] Connection reset by peer
Jun 5 23:04:41.826489 r-leopard-56 INFO healthd[8494]: ERROR:dbus.connection:Exception in handler for D-Bus signal:
Jun 5 23:04:41.826591 r-leopard-56 INFO healthd[8494]: Traceback (most recent call last):
Jun 5 23:04:41.826640 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3/dist-packages/dbus/connection.py", line 232, in maybe_handle_message
Jun 5 23:04:41.826686 r-leopard-56 INFO healthd[8494]: self._handler(*args, **kwargs)
Jun 5 23:04:41.826738 r-leopard-56 INFO healthd[8494]: File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 82, in on_job_removed
Jun 5 23:04:41.826785 r-leopard-56 INFO healthd[8494]: self.task_notify(msg)
Jun 5 23:04:41.826831 r-leopard-56 INFO healthd[8494]: File "/usr/local/lib/python3.9/dist-packages/health_checker/sysmonitor.py", line 110, in task_notify
Jun 5 23:04:41.826877 r-leopard-56 INFO healthd[8494]: self.task_queue.put(msg)
Jun 5 23:04:41.826923 r-leopard-56 INFO healthd[8494]: File "<string>", line 2, in put
Jun 5 23:04:41.826973 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/managers.py", line 808, in _callmethod
Jun 5 23:04:41.827018 r-leopard-56 INFO healthd[8494]: conn.send((self._id, methodname, args, kwds))
Jun 5 23:04:41.827065 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/connection.py", line 211, in send
Jun 5 23:04:41.827115 r-leopard-56 INFO healthd[8494]: self._send_bytes(_ForkingPickler.dumps(obj))
Jun 5 23:04:41.827158 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
Jun 5 23:04:41.827199 r-leopard-56 INFO healthd[8494]: self._send(header + buf)
Jun 5 23:04:41.827254 r-leopard-56 INFO healthd[8494]: File "/usr/lib/python3.9/multiprocessing/connection.py", line 373, in _send
Jun 5 23:04:41.827322 r-leopard-56 INFO healthd[8494]: n = write(self._handle, buf)
Jun 5 23:04:41.827368 r-leopard-56 INFO healthd[8494]: BrokenPipeError: [Errno 32] Broken pipe
Jun 5 23:04:42.800216 r-leopard-56 NOTICE healthd: Caught SIGTERM - exiting...
```
When the multiprocessing.Manager is shutdown the queue will raise the above errors. This happens during shutdown - fast-reboot, warm-reboot.
With the fix, system-health service does not hang:
```
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:07:56 PM IDT 2024: Stopping...
Thu Oct 17 01:07:58 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:08:13 PM IDT 2024: Stopping...
Thu Oct 17 01:08:14 PM IDT 2024: Stopped
root@sonic:/home/admin# sudo systemctl start system-health ; sleep 10; echo "$(date): Stopping..."; sudo systemctl stop system-health; echo "$(date): Stopped"
Thu Oct 17 01:09:05 PM IDT 2024: Stopping...
Thu Oct 17 01:09:06 PM IDT 2024: Stopped
```
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Remove the call to shutdown, the cleanup will happen automatically when GC runs as per documentation - https://docs.python.org/3/library/multiprocessing.html
#### How to verify it
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
Run warm-reboot, fast-reboot multiple times and verify no errors in the log.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202311
- [x] 202405
#### Tested branch (Please provide the tested image version)
<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->
- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
<!--
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
0 commit comments