Skip to content

Ensure redis client socket is closed and prevent race condition #11001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

prgeor
Copy link
Contributor

@prgeor prgeor commented Jun 1, 2022

Why I did it

  1. Prevent race condition inside get_all_namespaces() which is a not a reentrant function
  2. Ensure no FD leak as this fix reverts this PR9636

How I did it

  1. Store the redis client connection for CONFIG_DB in a local variable
  2. Forcefully close the redis client connection (not required since local connection object get destroyed once out of scope, but just in case)

How to verify it

Ran multiple times
1 show interface counters
2. portstat -a

Ensure the redis client socket connection gets closed after the get_all_namespace() returns
Ensure no fd leak
Ensure no race-condition as observed in Cisco chassis when multiple Xcvrd threads calls get_all_namespace()

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@prgeor prgeor requested a review from a team as a code owner June 1, 2022 23:36
metadata = config_db.get_table('DEVICE_METADATA')
if metadata['localhost']['sub_role'] == FRONTEND_ASIC_SUB_ROLE:
front_ns.append(namespace)
elif metadata['localhost']['sub_role'] == BACKEND_ASIC_SUB_ROLE:
back_ns.append(namespace)
if config_db:
# Ensure the client socket is closed
config_db.close('CONFIG_DB')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see build failure at this close call
File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 218, in get_all_namespaces
config_db.close('CONFIG_DB')
File "/usr/local/lib/python3.9/dist-packages/swsssdk/dbconnector.py", line 271, in close
self.dbintf.close(db_name)
File "/usr/local/lib/python3.9/dist-packages/swsssdk/interface.py", line 216, in close
self.redis_clients[db_name].connection_pool.disconnect()
AttributeError: 'SwssSyncClient' object has no attribute 'connection_pool'

if is_multi_asic():
for asic in range(num_asics):
namespace = "{}{}".format(ASIC_NAME_PREFIX, asic)
command = 'sonic-db-cli -n {} CONFIG_DB HGET "DEVICE_METADATA|localhost" "sub_role"'.format(namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, database container is failing to boot with the error at
"sonic-db-cli -n {} CONFIG_DB HGET "DEVICE_METADATA|localhost" "sub_role""

/usr/local/lib/python3.9/dist-packages/supervisor/options.py:473: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (inc.
self.warnings.warn(
Traceback (most recent call last):
File "/usr/local/bin/sonic-cfggen", line 32, in
from minigraph import minigraph_encoder, parse_xml, parse_device_desc_xml, parse_asic_sub_role, parse_asic_switch_type
File "/usr/local/lib/python3.9/dist-packages/minigraph.py", line 14, in
from portconfig import get_port_config
File "/usr/local/lib/python3.9/dist-packages/portconfig.py", line 10, in
from sonic_py_common.multi_asic import get_asic_id_from_name
File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 447, in
raise RuntimeError(
RuntimeError: Command sonic-db-cli -n asic0 CONFIG_DB HGET "DEVICE_METADATA|localhost" "sub_role" failed with stderr None
/usr/local/lib/python3.9/dist-packages/supervisor/options.py:473: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (inc.
self.warnings.warn(
Unlinking stale socket /var/run/supervisor.sock

@abdosi
Copy link
Contributor

abdosi commented Jun 15, 2022

@prgeor is there fd leak here or it is just too many fd open that is causing the issue ?

@prgeor
Copy link
Contributor Author

prgeor commented Jun 16, 2022

@prgeor is there fd leak here or it is just too many fd open that is causing the issue ?

@abdosi too many files open issue.

@abdosi
Copy link
Contributor

abdosi commented May 9, 2023

@anamehra do we still have this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants