Skip to content

Build failure due to TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dgsudharsan opened this issue Jun 21, 2023 · 1 comment · Fixed by #461
Closed

Build failure due to TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop #382

dgsudharsan opened this issue Jun 21, 2023 · 1 comment · Fixed by #461

Comments

@dgsudharsan
Copy link
Collaborator

We are noticing build failure due to TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop . Below is the trace

[2023-06-21T12:27:31.213Z] =================================== FAILURES ===================================
[2023-06-21T12:27:31.213Z] ____________ TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop _____________
[2023-06-21T12:27:31.213Z]
[2023-06-21T12:27:31.213Z] self = <tests.test_xcvrd.TestXcvrdScript object at 0x7ff1f404b610>
[2023-06-21T12:27:31.213Z]
[2023-06-21T12:27:31.213Z] @patch('xcvrd.xcvrd_utilities.port_mapping.subscribe_port_config_change', MagicMock(return_value=(None, None)))
[2023-06-21T12:27:31.213Z] def test_SfpStateUpdateTask_task_run_stop(self):
[2023-06-21T12:27:31.213Z] port_mapping = PortMapping()
[2023-06-21T12:27:31.213Z] stop_event = threading.Event()
[2023-06-21T12:27:31.213Z] sfp_error_event = threading.Event()
[2023-06-21T12:27:31.213Z] task = SfpStateUpdateTask(DEFAULT_NAMESPACE, port_mapping, stop_event, sfp_error_event)
[2023-06-21T12:27:31.213Z] task.start()
[2023-06-21T12:27:31.213Z] assert wait_until(5, 1, task.is_alive)
[2023-06-21T12:27:31.213Z] task.raise_exception()
[2023-06-21T12:27:31.213Z] > task.join()
[2023-06-21T12:27:31.213Z]
[2023-06-21T12:27:31.213Z] tests/test_xcvrd.py:1041:
[2023-06-21T12:27:31.214Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:2200: in join
[2023-06-21T12:27:31.214Z] raise self.exc
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:2175: in run
[2023-06-21T12:27:31.214Z] self.task_worker(self.task_stopping_event, self.sfp_error_event)
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1987: in task_worker
[2023-06-21T12:27:31.214Z] self.init()
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1905: in init
[2023-06-21T12:27:31.214Z] self.retry_eeprom_set = self._post_port_sfp_info_and_dom_thr_to_db_once(port_mapping_data, self.xcvr_table_helper, self.main_thread_stop_event)
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1845: in _post_port_sfp_info_and_dom_thr_to_db_once
[2023-06-21T12:27:31.214Z] warmstart.initialize("xcvrd", "pmon")
[2023-06-21T12:27:31.214Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] app_name = 'xcvrd', docker_name = 'pmon', db_timeout = 0, isTcpConn = False
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] @staticmethod
[2023-06-21T12:27:31.214Z] def initialize(app_name, docker_name, db_timeout=0, isTcpConn=False):
[2023-06-21T12:27:31.214Z] > return _swsscommon.WarmStart_initialize(app_name, docker_name, db_timeout, isTcpConn)
[2023-06-21T12:27:31.214Z] E RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] /usr/lib/python3/dist-packages/swsscommon/swsscommon.py:3369: RuntimeError
[2023-06-21T12:27:31.214Z] =============================== warnings summary ===============================
[2023-06-21T12:27:31.214Z] /usr/lib/python3/dist-packages/_pytest/junitxml.py:446
[2023-06-21T12:27:31.214Z] /usr/lib/python3/dist-packages/_pytest/junitxml.py:446: PytestDeprecationWarning: The 'junit_family' default value will change to 'xunit2' in pytest 6.0. See:
[2023-06-21T12:27:31.214Z] https://docs.pytest.org/en/stable/deprecations.html#junit-family-default-value-change-to-xunit2
[2023-06-21T12:27:31.214Z] for more information.
[2023-06-21T12:27:31.214Z] _issue_warning_captured(deprecated.JUNIT_XML_DEFAULT_FAMILY, config.hook, 2)
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] -- Docs: https://docs.pytest.org/en/stable/warnings.html
[2023-06-21T12:27:31.215Z] - generated xml file: /sonic/src/sonic-platform-daemons/sonic-xcvrd/test-results.xml -
[2023-06-21T12:27:31.215Z]
[2023-06-21T12:27:31.215Z] ----------- coverage: platform linux, python 3.9.2-final-0 -----------
[2023-06-21T12:27:31.215Z] Name Stmts Miss Cover
[2023-06-21T12:27:31.215Z] ----------------------------------------------------------------
[2023-06-21T12:27:31.215Z] xcvrd/init.py 0 0 100%
[2023-06-21T12:27:31.215Z] xcvrd/xcvrd.py 1548 358 77%
[2023-06-21T12:27:31.215Z] xcvrd/xcvrd_utilities/init.py 0 0 100%
[2023-06-21T12:27:31.215Z] xcvrd/xcvrd_utilities/port_mapping.py 191 26 86%
[2023-06-21T12:27:31.215Z] xcvrd/xcvrd_utilities/sfp_status_helper.py 27 2 93%
[2023-06-21T12:27:31.215Z] ----------------------------------------------------------------
[2023-06-21T12:27:31.215Z] TOTAL 1766 386 78%
[2023-06-21T12:27:31.215Z] Coverage HTML written to dir htmlcov
[2023-06-21T12:27:31.215Z] Coverage XML written to file coverage.xml
[2023-06-21T12:27:31.215Z]
[2023-06-21T12:27:31.215Z] =========================== short test summary info ============================
[2023-06-21T12:27:31.215Z] FAILED tests/test_xcvrd.py::TestXcvrdScript::test_SfpStateUpdateTask_task_run_stop
[2023-06-21T12:27:31.215Z] =================== 1 failed, 69 passed, 1 warning in 9.18s ====================
[2023-06-21T12:27:31.215Z] [ FAIL LOG END ] [ target/python-wheels/bullseye/sonic_xcvrd-1.0-py3-none-any.whl ]

@dgsudharsan
Copy link
Collaborator Author

Enc ountered this again @prgeor @mihirpat1

[2024-04-04T00:21:36.795Z] =================================== FAILURES ===================================
[2024-04-04T00:21:36.795Z] ____________ TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop _____________
[2024-04-04T00:21:36.795Z] 
[2024-04-04T00:21:36.795Z] self = <tests.test_xcvrd.TestXcvrdScript object at 0x7f4acf0acb20>
[2024-04-04T00:21:36.795Z] 
[2024-04-04T00:21:36.795Z]     @patch('xcvrd.xcvrd_utilities.port_event_helper.subscribe_port_config_change', MagicMock(return_value=(None, None)))
[2024-04-04T00:21:36.795Z]     def test_SfpStateUpdateTask_task_run_stop(self):
[2024-04-04T00:21:36.796Z]         port_mapping = PortMapping()
[2024-04-04T00:21:36.796Z]         stop_event = threading.Event()
[2024-04-04T00:21:36.796Z]         sfp_error_event = threading.Event()
[2024-04-04T00:21:36.796Z]         task = SfpStateUpdateTask(DEFAULT_NAMESPACE, port_mapping, stop_event, sfp_error_event)
[2024-04-04T00:21:36.796Z]         task.start()
[2024-04-04T00:21:36.796Z] >       assert wait_until(5, 1, task.is_alive)
[2024-04-04T00:21:36.796Z] E       assert False
[2024-04-04T00:21:36.796Z] E        +  where False = wait_until(5, 1, <bound method Thread.is_alive of <SfpStateUpdateTask(SfpStateUpdateTask, stopped 139959240861440)>>)
[2024-04-04T00:21:36.796Z] E        +    where <bound method Thread.is_alive of <SfpStateUpdateTask(SfpStateUpdateTask, stopped 139959240861440)>> = <SfpStateUpdateTask(SfpStateUpdateTask, stopped 139959240861440)>.is_alive
[2024-04-04T00:21:36.796Z] 
[2024-04-04T00:21:36.796Z] tests/test_xcvrd.py:1967: AssertionError
[2024-04-04T00:21:36.796Z] =============================== warnings summary ===============================
[2024-04-04T00:21:36.796Z] /usr/lib/python3/dist-packages/_pytest/junitxml.py:446
[2024-04-04T00:21:36.796Z]   /usr/lib/python3/dist-packages/_pytest/junitxml.py:446: PytestDeprecationWarning: The 'junit_family' default value will change to 'xunit2' in pytest 6.0. See:
[2024-04-04T00:21:36.796Z]     https://docs.pytest.org/en/stable/deprecations.html#junit-family-default-value-change-to-xunit2
[2024-04-04T00:21:36.796Z]   for more information.
[2024-04-04T00:21:36.796Z]     _issue_warning_captured(deprecated.JUNIT_XML_DEFAULT_FAMILY, config.hook, 2)
[2024-04-04T00:21:36.796Z] 
[2024-04-04T00:21:36.796Z] -- Docs: https://docs.pytest.org/en/stable/warnings.html
[2024-04-04T00:21:36.796Z] - generated xml file: /sonic/src/sonic-platform-daemons/sonic-xcvrd/test-results.xml -
[2024-04-04T00:21:36.796Z] 
[2024-04-04T00:21:36.796Z] ----------- coverage: platform linux, python 3.9.2-final-0 -----------
[2024-04-04T00:21:36.796Z] Name                                             Stmts   Miss  Cover
[2024-04-04T00:21:36.796Z] --------------------------------------------------------------------
[2024-04-04T00:21:36.796Z] xcvrd/__init__.py                                    0      0   100%
[2024-04-04T00:21:36.796Z] xcvrd/sff_mgr.py                                   217     26    88%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd.py                                    1495    341    77%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd_utilities/__init__.py                    0      0   100%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd_utilities/media_settings_parser.py     236     43    82%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd_utilities/optics_si_parser.py          106     38    64%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd_utilities/port_event_helper.py         205     17    92%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd_utilities/sfp_status_helper.py          27      2    93%
[2024-04-04T00:21:36.797Z] xcvrd/xcvrd_utilities/xcvr_table_helper.py          51      4    92%
[2024-04-04T00:21:36.797Z] --------------------------------------------------------------------
[2024-04-04T00:21:36.797Z] TOTAL                                             2337    471    80%
[2024-04-04T00:21:36.797Z] Coverage HTML written to dir htmlcov
[2024-04-04T00:21:36.797Z] Coverage XML written to file coverage.xml
[2024-04-04T00:21:36.797Z] 
[2024-04-04T00:21:36.797Z] =========================== short test summary info ============================
[2024-04-04T00:21:36.797Z] FAILED tests/test_xcvrd.py::TestXcvrdScript::test_SfpStateUpdateTask_task_run_stop
[2024-04-04T00:21:36.797Z] ================== 1 failed, 152 passed, 1 warning in 14.93s ===================
[2024-04-04T00:21:36.797Z] [  FAIL LOG END  ] [ target/python-wheels/bullseye/sonic_xcvrd-1.0-py3-none-any.whl ]
[2024-04-04T00:21:36.797Z] make: *** [slave.mk:925: target/python-wheels/bullseye/sonic_xcvrd-1.0-py3-none-any.whl] Error 1
[2024-04-04T00:21:36.797Z] make: *** Waiting for unfinished jobs....
[2024-04-04T00:21:38.791Z] [ finished ] [ target/python-wheels/bullseye/mlnx_platform_api-1.0-py3-none-any.whl ] 
[2024-04-04T00:21:39.071Z] [ finished ] [ target/python-wheels/bullseye/asyncsnmp-2.1.0-py3-none-any.whl ] 
[2024-04-04T00:25:31.373Z] Waiting dpkg lock for 10, 1/60, info: Locked by command: /usr/bin/dpkg -P mlnx-sai
[2024-04-04T00:25:31.373Z] [ finished ] [ target/debs/bullseye/syncd_1.0.0_amd64.deb ] 
[2024-04-04T00:25:39.649Z] dpkg-query: package 'mlnx-sai' is not installed and no information is available
[2024-04-04T00:25:39.649Z] Use dpkg --info (= dpkg-deb --info) to examine archive files.
[2024-04-04T00:25:39.649Z] [ finished ] [ target/debs/bullseye/libsaivs-dev_1.0.0_amd64.deb-install ] 
[2024-04-04T00:33:31.691Z] [ finished ] [ target/python-wheels/bullseye/sonic_utilities-1.2-py3-none-any.whl ] 
[2024-04-04T00:34:04.029Z] [ finished ] [ target/debs/bullseye/symcrypt-openssl_0.10_amd64.deb ] 
[2024-04-04T00:34:05.995Z] Makefile.work:627: recipe for target 'bullseye' failed
[2024-04-04T00:34:05.995Z] make[1]: *** [bullseye] Error 2
[2024-04-04T00:34:05.995Z] make[1]: Leaving directory '/builds2/sw-r2d2-bot/workspace/sonic_main/sonic'
[2024-04-04T00:34:05.995Z] Makefile:44: recipe for target 'target/sonic-mellanox.bin' failed

StormLiangMS pushed a commit that referenced this issue Apr 11, 2024
…ask_run_stop (#461) (#463)

Cherry-pick for #461
MSFT ADO - 24271303

Description
An intermittent build failure is seen for the test_SfpStateUpdateTask_task_run_stop unit-test with the below backtrace. We have observed more than one type of backtrace so far (details are mentioned in the issues mentioned below).

[2023-06-21T12:27:31.213Z] =================================== FAILURES ===================================
[2023-06-21T12:27:31.213Z] ____________ TestXcvrdScript.test_SfpStateUpdateTask_task_run_stop _____________
[2023-06-21T12:27:31.213Z]
[2023-06-21T12:27:31.213Z] self = <tests.test_xcvrd.TestXcvrdScript object at 0x7ff1f404b610>
[2023-06-21T12:27:31.213Z]
[2023-06-21T12:27:31.213Z] @patch('xcvrd.xcvrd_utilities.port_mapping.subscribe_port_config_change', MagicMock(return_value=(None, None)))
[2023-06-21T12:27:31.213Z] def test_SfpStateUpdateTask_task_run_stop(self):
[2023-06-21T12:27:31.213Z] port_mapping = PortMapping()
[2023-06-21T12:27:31.213Z] stop_event = threading.Event()
[2023-06-21T12:27:31.213Z] sfp_error_event = threading.Event()
[2023-06-21T12:27:31.213Z] task = SfpStateUpdateTask(DEFAULT_NAMESPACE, port_mapping, stop_event, sfp_error_event)
[2023-06-21T12:27:31.213Z] task.start()
[2023-06-21T12:27:31.213Z] assert wait_until(5, 1, task.is_alive)
[2023-06-21T12:27:31.213Z] task.raise_exception()
[2023-06-21T12:27:31.213Z] > task.join()
[2023-06-21T12:27:31.213Z]
[2023-06-21T12:27:31.213Z] tests/test_xcvrd.py:1041:
[2023-06-21T12:27:31.214Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:2200: in join
[2023-06-21T12:27:31.214Z] raise self.exc
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:2175: in run
[2023-06-21T12:27:31.214Z] self.task_worker(self.task_stopping_event, self.sfp_error_event)
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1987: in task_worker
[2023-06-21T12:27:31.214Z] self.init()
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1905: in init
[2023-06-21T12:27:31.214Z] self.retry_eeprom_set = self._post_port_sfp_info_and_dom_thr_to_db_once(port_mapping_data, self.xcvr_table_helper, self.main_thread_stop_event)
[2023-06-21T12:27:31.214Z] xcvrd/xcvrd.py:1845: in _post_port_sfp_info_and_dom_thr_to_db_once
[2023-06-21T12:27:31.214Z] warmstart.initialize("xcvrd", "pmon")
[2023-06-21T12:27:31.214Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] app_name = 'xcvrd', docker_name = 'pmon', db_timeout = 0, isTcpConn = False
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] @staticmethod
[2023-06-21T12:27:31.214Z] def initialize(app_name, docker_name, db_timeout=0, isTcpConn=False):
[2023-06-21T12:27:31.214Z] > return _swsscommon.WarmStart_initialize(app_name, docker_name, db_timeout, isTcpConn)
[2023-06-21T12:27:31.214Z] E RuntimeError: Unable to connect to redis (unix-socket): Cannot assign requested address
[2023-06-21T12:27:31.214Z]
[2023-06-21T12:27:31.214Z] /usr/lib/python3/dist-packages/swsscommon/swsscommon.py:3369: RuntimeError
[2023-06-21T12:27:31.214Z] =============================== warnings summary ===============================
Consistent way to fail the test
Add at least 1s sleep in between the below lines
sonic-platform-daemons/sonic-xcvrd/tests/test_xcvrd.py

Lines 1966 to 1967 in d0fd1e1

 task.start() 
 assert wait_until(5, 1, task.is_alive) 
Motivation and Context
fixes #369, fixes #370, fixes #382

Intention of the testcase
The test case tests a scenario wherein while SfpStateUpdateTask thread is busy polling for few seconds, its parent process (xcvrd in an ideal case) generates an exception to stop the SfpStateUpdateTask thread in the middle of polling.

RCA for current failure
Since test_SfpStateUpdateTask_task_run_stop does not have all the underlying functions of SfpStateUpdateTask.task_worker mocked, there is a probability of SfpStateUpdateTask thread exiting early which will thereby cause assert wait_until(5, 1, task.is_alive) to fail based on the place at which SfpStateUpdateTask thread crashes.

Fix details
Modified the implementation of XcvrTableHelper function to have a busy polling to ensure that SfpStateUpdateTask thread does not quit before the test_SfpStateUpdateTask_task_run_stop testcase raises an exception.
Also, this approach will ensure that if any new function calls are being added in SfpStateUpdateTask.task_worker, they will not needed to be mocked in SfpStateUpdateTask.task_worker since XcvrTableHelper is the first function called in SfpStateUpdateTask.task_worker.

How Has This Been Tested?
Ran the test multiple times to ensure that the testcases passes.
Also, added a 3s sleep in between the below lines and ensured that the testcases was passing consistently.
sonic-platform-daemons/sonic-xcvrd/tests/test_xcvrd.py

Lines 1966 to 1967 in d0fd1e1

 task.start() 
 assert wait_until(5, 1, task.is_alive)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant