Skip to content

Commit 9d3251c

Browse files
Junchao-Mellanoxmssonicbld
authored andcommitted
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown (sonic-net#13611)
- Why I did it Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation. To fix issue sonic-net#13591. - How I did it The select call in get_change_event should use no more than 1 second as timeout parameter. Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected - How to verify it Manual test
1 parent 67ea31e commit 9d3251c

File tree

1 file changed

+14
-10
lines changed
  • platform/mellanox/mlnx-platform-api/sonic_platform

1 file changed

+14
-10
lines changed

platform/mellanox/mlnx-platform-api/sonic_platform/chassis.py

+14-10
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@
3131
from . import utils
3232
from .device_data import DeviceDataManager
3333
import re
34+
import time
3435
except ImportError as e:
3536
raise ImportError (str(e) + "- required module not found")
3637

37-
MAX_SELECT_DELAY = 3600
3838

3939
RJ45_TYPE = "RJ45"
4040

@@ -387,26 +387,30 @@ def get_change_event(self, timeout=0):
387387
self.sfp_event.initialize()
388388

389389
wait_for_ever = (timeout == 0)
390+
# select timeout should be no more than 1000ms to ensure fast shutdown flow
391+
select_timeout = 1000.0 if timeout >= 1000 else float(timeout)
390392
port_dict = {}
391393
error_dict = {}
392-
if wait_for_ever:
393-
timeout = MAX_SELECT_DELAY
394-
while True:
395-
status = self.sfp_event.check_sfp_status(port_dict, error_dict, timeout)
396-
if bool(port_dict):
394+
begin = time.time()
395+
while True:
396+
status = self.sfp_event.check_sfp_status(port_dict, error_dict, select_timeout)
397+
if bool(port_dict):
398+
break
399+
400+
if not wait_for_ever:
401+
elapse = time.time() - begin
402+
if elapse * 1000 > timeout:
397403
break
398-
else:
399-
status = self.sfp_event.check_sfp_status(port_dict, error_dict, timeout)
400404

401405
if status:
402406
if port_dict:
403407
self.reinit_sfps(port_dict)
404-
result_dict = {'sfp':port_dict}
408+
result_dict = {'sfp': port_dict}
405409
if error_dict:
406410
result_dict['sfp_error'] = error_dict
407411
return True, result_dict
408412
else:
409-
return True, {'sfp':{}}
413+
return True, {'sfp': {}}
410414

411415
def reinit_sfps(self, port_dict):
412416
"""

0 commit comments

Comments
 (0)