Skip to content

Fix SubscriberStateTable::hasCachedData formula for a timing risk #379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 11, 2020

Conversation

qiluo-msft
Copy link
Contributor

@qiluo-msft qiluo-msft commented Sep 10, 2020

The issue statement:
If the use case of SubscriberStateTable meets below conditions:

  1. First Select::select(), then SubscriberStateTable::pop() instead of SubscriberStateTable::pops()
  2. The table under SubscriberStateTable is changing extreme high frequently

There is a rare timing risk inside the implementation of SubscriberStateTable, leading to a state that SubscriberStateTable holds two entries but Select::select() is stuck. If the table has new change (operation), SubscriberStateTable may recover.

The fix:
SubscriberStateTable::hasCachedData should return true if there are 2 or more entries internally, including entries in m_buffer which were read by previous pops(), or entries in m_keyspace_event_buffer which will be read by next pops().

Ideally, we need to cherry-pick to all the release branches since 201803.

Search the code, I found similar use cases, so the same issue is possible to happen at one of the places.

src/sonic-platform-daemons/sonic-xcvrd/scripts/xcvrd:943:        sst = swsscommon.SubscriberStateTable(appl_db, swsscommon.APP_PORT_TABLE_NAME)
src/sonic-platform-daemons/sonic-ledd/scripts/ledd:83:        sst = swsscommon.SubscriberStateTable(appl_db, swsscommon.APP_PORT_TABLE_NAME)
dockers/docker-lldp/lldpmgrd:207:        sst_confdb = swsscommon.SubscriberStateTable(self.config_db, swsscommon.CFG_PORT_TABLE_NAME)
dockers/docker-lldp/lldpmgrd:211:        sst_appdb = swsscommon.SubscriberStateTable(self.appl_db, swsscommon.APP_PORT_TABLE_NAME)
files/image_config/caclmgrd/caclmgrd:466:            subscribe_acl_table = swsscommon.SubscriberStateTable(acl_db_connector, swsscommon.CFG_ACL_TABLE_TABLE_NAME)
files/image_config/caclmgrd/caclmgrd:468:            subscribe_acl_rule_table = swsscommon.SubscriberStateTable(acl_db_connector, swsscommon.CFG_ACL_RULE_TABLE_NAME)

pavel-shirshov
pavel-shirshov previously approved these changes Sep 10, 2020
@lguohan lguohan added the Bug label Sep 10, 2020
@lguohan
Copy link
Contributor

lguohan commented Sep 10, 2020

it sounds like a critical bug? what releases we need to backport?

@qiluo-msft
Copy link
Contributor Author

Ideally, we need to cherry-pick to all the release branches since 201803.
Search the code, I found similar use cases, so the same issue is possible to happen at one of the places.

src/sonic-platform-daemons/sonic-xcvrd/scripts/xcvrd:943:        sst = swsscommon.SubscriberStateTable(appl_db, swsscommon.APP_PORT_TABLE_NAME)
src/sonic-platform-daemons/sonic-ledd/scripts/ledd:83:        sst = swsscommon.SubscriberStateTable(appl_db, swsscommon.APP_PORT_TABLE_NAME)
dockers/docker-lldp/lldpmgrd:207:        sst_confdb = swsscommon.SubscriberStateTable(self.config_db, swsscommon.CFG_PORT_TABLE_NAME)
dockers/docker-lldp/lldpmgrd:211:        sst_appdb = swsscommon.SubscriberStateTable(self.appl_db, swsscommon.APP_PORT_TABLE_NAME)
files/image_config/caclmgrd/caclmgrd:466:            subscribe_acl_table = swsscommon.SubscriberStateTable(acl_db_connector, swsscommon.CFG_ACL_TABLE_TABLE_NAME)
files/image_config/caclmgrd/caclmgrd:468:            subscribe_acl_rule_table = swsscommon.SubscriberStateTable(acl_db_connector, swsscommon.CFG_ACL_RULE_TABLE_NAME)

@qiluo-msft qiluo-msft changed the title Fix SubscriberStateTable::hasCachedData formula Fix SubscriberStateTable::hasCachedData formula for a timing risk Sep 10, 2020
@qiluo-msft qiluo-msft merged commit cc2f80b into sonic-net:master Sep 11, 2020
@qiluo-msft qiluo-msft deleted the qiluo/subscrfix branch September 11, 2020 15:58
qiluo-msft added a commit that referenced this pull request Sep 11, 2020
* Fix hasCachedData formula
* Add test case to verify the fix
* Remove dead code in test
qiluo-msft added a commit that referenced this pull request Sep 11, 2020
* Fix hasCachedData formula
* Add test case to verify the fix
* Remove dead code in test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants