Skip to content

cisco platform-specific timeout update #16762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

nnelluri-cisco
Copy link
Contributor

@nnelluri-cisco nnelluri-cisco commented Feb 3, 2025

Description of PR

New cisco platform x86_64-8102_28fh_dpu_o-r0' reboot timeout fix.

Summary: New cisco platform x86_64-8102_28fh_dpu_o-r0' reboot timeout fix.
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • [ X] Test case improvement

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • [X ] 202405
  • [ X] 202411

Approach

added new code under b/tests/common/reboot.py , the code only will execute for cisco platform x86_64-8102_28fh_dpu_o-r0.

What is the motivation for this PR?

Need reboot fix to execute sonic-mgmt runs on new cisco platform "x86_64-8102_28fh_dpu_o-r0"

How did you do it?

added new code under b/tests/common/reboot.py

How did you verify/test it?

verified using below reboot test case and sure new new code got executed.
test_pretest.py::test_features_state

Any platform specific information?

cisco platform "x86_64-8102_28fh_dpu_o-r0"

Supported testbed topology if it's a new test case?

t1-28-lag

Documentation

verification Logs:
========================================================================================= warnings summary ==========================================================================================
../../usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236
/usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------------------------------------------------- generated xml file: /run_logs/sample/test_pretest.py::test_features_state_2025-02-03-04-25-24.xml -------------------------------------------------
INFO:root:Can not get Allure report URL. Please check logs
-------------------------------------------------------------------------------------- live log sessionfinish ---------------------------------------------------------------------------------------
05:09:54 init.pytest_terminal_summary L0067 INFO | Can not get Allure report URL. Please check logs
============================================================================= 1 passed, 1 warning in 2668.22s (0:44:28) =============================================================================
root@sonic-ucs-m6-15:/data/tests#

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nnelluri-cisco
Copy link
Contributor Author

Notify
@prabhataravind and @vvolam

@@ -245,6 +245,14 @@ def reboot(duthost, localhost, reboot_type='cold', delay=10,
"""
pool = ThreadPool()
hostname = duthost.hostname
# This is how you get platform info from the DUT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnelluri-cisco Could you move these new changes to a function? We could add all platform specific changes into that function and call here. Otherwise, looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvolam
added platform specific changes to new function "smartswitch_update_reboot_timeout".

@vvolam
Copy link
Contributor

vvolam commented Feb 3, 2025

Why not 202411?

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nnelluri-cisco
Copy link
Contributor Author

Why not 202411?

Why not 202411?

added back port request to 202411.

Copy link
Contributor

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments, otherwise, looks good to me.

@oleksandrivantsiv @gpunathilell for viz.

@@ -576,3 +574,13 @@ def collect_mgmt_config_by_console(duthost, localhost):
logger.info('End: collect mgmt config by console ...')
else:
logger.warning("dut console is not ready, we can get mgmt config by console")

def smartswitch_update_reboot_timeout(duthost):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a summary to the function here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added summary to the function

@@ -38,6 +38,9 @@
REBOOT_TYPE_KERNEL_PANIC = "Kernel Panic"
REBOOT_TYPE_SUPERVISOR = "Reboot from Supervisor"
REBOOT_TYPE_SUPERVISOR_HEARTBEAT_LOSS = "Heartbeat with the Supervisor card lost"
SMARTSWITCH_REBOOT_TIMEOUT = 700
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment before these timeouts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment for the timeouts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be a satisfactory solution to modify a generic function to handle the timeout for smartswitch. Can we do that in a more generic way, like updating the reboot_ctrl_dict?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had an offline sync with Bing. There are some concerns in making this platform-specific change in reboot function. @nnelluri-cisco, @rameshraghupathy Let's instead try to do this via plt_reboot_dict which is processed by reboot function using get_plt_reboot_ctrl().

for reboot_type in reboot_ctrl_dict:
reboot_ctrl_dict[reboot_type]["timeout"] = 700
reboot_ctrl_dict[reboot_type]["wait"] = 300
#smartswicth specific reboot and waittime update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: "wait time" instead of "waittime"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also typo for smartswitch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed type changes as per comment.

if platform == 'x86_64-8102_28fh_dpu_o-r0':
for reboot_type in reboot_ctrl_dict:
reboot_ctrl_dict[reboot_type]["timeout"] = SMARTSWITCH_REBOOT_TIMEOUT
reboot_ctrl_dict[reboot_type]["wait"] = SMARTSWITCH_REBOT_WAIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix nit SMARTSWITCH_REBOT_WAIT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed SMARTSWITCH_REBOT_WAIT to SMARTSWITCH_REBO0T_WAIT

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

if platform == 'x86_64-8102_28fh_dpu_o-r0':
for reboot_type in reboot_ctrl_dict:
reboot_ctrl_dict[reboot_type]["timeout"] = SMARTSWITCH_REBOOT_TIMEOUT
reboot_ctrl_dict[reboot_type]["wait"] = SMARTSWITCH_REBO0T_WAIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMARTSWITCH_REBOOT_WAIT is having one zero instead of "O"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected the REBOOT typo

@@ -38,6 +38,9 @@
REBOOT_TYPE_KERNEL_PANIC = "Kernel Panic"
REBOOT_TYPE_SUPERVISOR = "Reboot from Supervisor"
REBOOT_TYPE_SUPERVISOR_HEARTBEAT_LOSS = "Heartbeat with the Supervisor card lost"
SMARTSWITCH_REBOOT_TIMEOUT = 700
SMARTSWITCH_REBO0T_WAIT = 300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMARTSWITCH_REBOOT_WAIT is having one zero instead of "O"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected the REBOOT typo

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Contributor

@prabhataravind prabhataravind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nnelluri-cisco
Copy link
Contributor Author

@bingwang-ms
could you please review and merge this PR

@prabhataravind
Copy link
Contributor

prabhataravind commented Mar 14, 2025

/azpw run Azure.sonic-mgmt

@prabhataravind
Copy link
Contributor

@nnelluri-cisco please update your branch from master.

Copy link

Commenter does not have sufficient privileges for PR 16762 in repo sonic-net/sonic-mgmt

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nnelluri-cisco
Copy link
Contributor Author

nnelluri-cisco commented Mar 18, 2025

@mssonicbld and @prabhataravind
I rebased with upstream , can you please merge this PR.

@prabhataravind
Copy link
Contributor

@mssonicbld and @prabhataravind I rebased with upstream , can you please merge this PR.

@nnelluri-cisco there are still some PR checks failing...I just restarted the tests.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nnelluri-cisco
Copy link
Contributor Author

/azp run

Copy link

Commenter does not have sufficient privileges for PR 16762 in repo sonic-net/sonic-mgmt

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prabhataravind
Copy link
Contributor

@bingwang-ms could you please help merge this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants