-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix hostcfgd does not react to SIGINT issue. #10977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@stepanblyschak, could you please review this PR? |
|
||
# Initialize swsscommon signal helper, because python signal handler will be blocked by long running c++ code. | ||
# For more information please check: https://docs.python.org/3/library/signal.html | ||
SignalHandlerHelper.registerSignalHandler(SIGNAL_INT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this overrides signal.signal(signal.SIGINT, signal_handler)
, we should probably remove signal.SIGINT
processing from hostcfgd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, on reboot, by default systemd sends SIGTERM but not SIGINT. So, this will not handle delay on reboot. In addition to this, we should probably extend hostcfgd.service
with KillSignal=SIGINT
or ExecStop=
that pkill hostcfgd with SIGTERM
and then with SIGINT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either
[Service]
Type=simple
ExecStart=/usr/local/bin/hostcfgd
KillSignal=SIGINT
or
[Service]
Type=simple
ExecStart=/usr/local/bin/hostcfgd
ExecStop=/bin/bash -c '/usr/bin/killall -15 hostcfgd && /usr/bin/killall -2 hostcfgd'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, currently swsscommon only handle SIGINT, so will add SIGTERM support with this PR: sonic-net/sonic-swss-common#626
I will update this PR after swsscommon updated.
@liuh-80 I agree with @akokhan, we need to handle SIGTERM and remove signal handlers defined in python since they are overwritten by swss-common. Although, I have some concerns on usability of such approach. IMO we have lots of python daemons that use ConfigDBConnector.listen() and usually these daemons use custom signal handlers in order to perform cleanup. We then need to review this python code and rewrite their cleanup procedures. I prepared the following change - sonic-net/sonic-swss-common#624 which is based on your original PR but does not use signal helpers. The idea is that we can pass interrupt_on_signal=True to select() which will then return immidiatelly once errno == EINTR. The ConfigDBConnector.listen() method will pass interrupt_on_signal=True to PubSub.listen_messages() which will pass it to select(). This means when signal occurs we will exit the C++ .so code and enter python world where python can run his signal handlers and then (if signal handler didn't exit) repeat PubSub.listen_messages() in the loop, so that we don't exit the ConfigDBConnector.listen() on signals that we don't want to exit on. With such solution we don't need to modify hostcfgd and potentially many other daemons |
@stepanblyschak , according to your PR any signal will break ConfigDBConnector.listen() method. that maybe not a expected behavior. |
@liuh-80 The idea is that PubSub.listen_message() will exit on signal, however ConfigDBConnect.liste() will call PubSub.listen_message() in a while loop, so listen() does not exit. In my testing I was sending SIGHUP to hostcfgd, I see that hostcfgd's signal handler prints a message about received signal but the daemon still runs. When I send SIGTERM I see hostcfgd's message in the syslog about received signal and hostcfgd exits. |
I'm working on a change in sonic-swss-common to support register python signal handler to swsscommon, and will update this PR after that:
|
Close this PR because Stepan has a better solution without change any code here: https://github.com/Azure/sonic-swss-common/pull/624/files |
Why I did it
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)