-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix swsscommon psubscribe code break in frrcfgd #13836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1445,6 +1445,7 @@ class ExtConfigDBConnector(ConfigDBConnector): | |||
def __init__(self, ns_attrs = None): | |||
super(ExtConfigDBConnector, self).__init__() | |||
self.nosort_attrs = ns_attrs if ns_attrs is not None else {} | |||
self.sub_thread_running = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if msg: | ||
self.sub_msg_handler(msg) | ||
|
||
self.pubsub.punsubscribe(sub_key_space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found similar code in https://github.com/sonic-net/sonic-swss-common/blob/85f3776ae6da44a083c617ff4ce6e54e1e2d5dac/common/configdb.h#L78. Do we also need punsubscribe
there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After investigation the code of ConfigDBConnector, we don't need punsubscribe there because inside the listen() there is a infinity loop. as my understand the design is listen() will stop when process exit, so not necessary to unsub.
@zhaozhenhong Could you help review? |
Please also tag @mhampasagar and @rathnasabapathyv for review, thanks. |
Fix swsscommon psubscribe code break in frrcfgd #### Why I did it Fix frrcfgd psubscribe code break: sonic-net#13109 The code issue caused by API change when migrate from swsssdk to swsscommon #### How I did it Fix frrcfgd code to use swsscommon psubscribe API. #### How to verify it Pass all UT. Manually check fixed code work correctly.
@qiluo-msft Would you please merge this PR to 202211? Thanks! |
This change caused my frrcfgd to spike to 100% CPU usage on my devices. I changed the default timeout to 2 seconds instead of 1 ms and that seemed to solve things. diff --git a/src/sonic-frr-mgmt-framework/frrcfgd/frrcfgd.py b/src/sonic-frr-mgmt-framework/frrcfgd/frrcfgd.py
index bc53e38cf..6c18008f1 100755
--- a/src/sonic-frr-mgmt-framework/frrcfgd/frrcfgd.py
+++ b/src/sonic-frr-mgmt-framework/frrcfgd/frrcfgd.py
@@ -1471,12 +1471,12 @@ class ExtConfigDBConnector(ConfigDBConnector):
syslog.syslog(syslog.LOG_ERR, '[bgp cfgd] Failed handling config DB update with exception:' + str(e))
logging.exception(e)
- def listen_thread(self, timeout):
+ def listen_thread(self):
self.__listen_thread_running = True
sub_key_space = "__keyspace@{}__:*".format(self.get_dbid(self.db_name))
self.pubsub.psubscribe(sub_key_space)
while self.__listen_thread_running:
- msg = self.pubsub.get_message(timeout, True)
+ msg = self.pubsub.get_message(timeout=2.0, interrupt_on_signal=True)
if msg:
self.sub_msg_handler(msg)
@@ -1486,7 +1486,7 @@ class ExtConfigDBConnector(ConfigDBConnector):
"""Start listen Redis keyspace events and will trigger corresponding handlers when content of a table changes.
"""
self.pubsub = self.get_redis_client(self.db_name).pubsub()
- self.sub_thread = threading.Thread(target=self.listen_thread, args=(0.01,))
+ self.sub_thread = threading.Thread(target=self.listen_thread)
self.sub_thread.start()
def stop_listen(self): |
Hi @bluecmd, can you share how to reproduce this issue and what's the hardware SKU and OS version? |
Hello @liuh-80 Sure - I applied this PR on our 202211 branch and installed the resulting image on both a Celestica Questone 2 and a Celestica Silverstone-X. When running I looked at |
Hi @bluecmd , I check on following celestica hardware and not found this issue, I will create some test code and give update later. SONiC Software Version: SONiC.20230531.19 Platform: x86_64-cel_e1031-r0 ~$ top
917654 root 20 0 65316 41512 17584 R 33.1 2.1 0:01.03 show |
Interesting. Let me know if you need me to test anything. That said, if I may suggest, I believe that the 0.01s timeout is too low even on platforms where it does not show up as high CPU - since the subscribe is running in its own thread it only needs to wake up occasionally to check if the thread should terminate, and that does not need to happen often at all. |
I can reproduce this issue by a test script, fix PR is here: #18225 |
Fix swsscommon psubscribe code break in frrcfgd #### Why I did it Fix frrcfgd psubscribe code break: sonic-net#13109 The code issue caused by API change when migrate from swsssdk to swsscommon #### How I did it Fix frrcfgd code to use swsscommon psubscribe API. #### How to verify it Pass all UT. Manually check fixed code work correctly.
Fix swsscommon psubscribe code break in frrcfgd #### Why I did it Fix frrcfgd psubscribe code break: sonic-net#13109 The code issue caused by API change when migrate from swsssdk to swsscommon #### How I did it Fix frrcfgd code to use swsscommon psubscribe API. #### How to verify it Pass all UT. Manually check fixed code work correctly.
Fix swsscommon psubscribe code break in frrcfgd #### Why I did it Fix frrcfgd psubscribe code break: sonic-net#13109 The code issue caused by API change when migrate from swsssdk to swsscommon #### How I did it Fix frrcfgd code to use swsscommon psubscribe API. #### How to verify it Pass all UT. Manually check fixed code work correctly.
Fix swsscommon psubscribe code break in frrcfgd
Why I did it
Fix frrcfgd psubscribe code break: #13109
The code issue caused by API change when migrate from swsssdk to swsscommon
How I did it
Fix frrcfgd code to use swsscommon psubscribe API.
How to verify it
Pass all UT.
Manually check fixed code work correctly.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Fix frrcfgd psubscribe code break: #13109
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)