Skip to content

[advance-reboot] Add timeout to reboot cases #4532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 22, 2021

Conversation

vaibhavhd
Copy link
Contributor

@vaibhavhd vaibhavhd commented Oct 22, 2021

Description of PR

Summary: Advance reboot cases are sometimes seen to get hung. This causes run_tests.sh to get hung. This PR adds a timeout to the advance-reboot cases.

Fixes # (issue)

Type of change

  • [] Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911

Approach

What is the motivation for this PR?

How did you do it?

Added timeout to ptfrunner.
Handled exception to allow running next cases (in sad case)

How did you verify/test it?

Tested on a physical testbed: set a timeout of 20s and all sad cases timed out but still continued with the next one.

03:41:36 advanced_reboot.__runPtfRunner           L0542 INFO   | Run advanced-reboot ReloadTest on the PTF host
03:42:00 advanced_reboot.__runPtfRunner           L0558 ERROR  | Timed out after 20s of executing advance reboot case: neigh_lag_member_down:3:6.. Error message: run module shell failed
03:42:00 advanced_reboot.__fetchTestLogs          L0374 INFO   | Extract log files on dut host
03:42:04 advanced_reboot.__fetchTestLogs          L0383 INFO   | Fetching log files from ptf and dut hosts
03:42:21 advanced_reboot.__clearArpAndFdbTables   L0347 INFO   | Clearing arp entries on DUT  str-7260cx3-acs-1
03:42:22 advanced_reboot.__clearArpAndFdbTables   L0350 INFO   | Clearing all fdb entries on DUT  str-7260cx3-acs-1
03:42:23 advanced_reboot.__revertRebootOper       L0491 INFO   | Running revert handler for reboot operation neigh_lag_member_down:3:6

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@vaibhavhd vaibhavhd requested a review from yxieca October 22, 2021 03:37
@vaibhavhd vaibhavhd requested a review from a team as a code owner October 22, 2021 03:37
@vaibhavhd vaibhavhd merged commit 622253e into sonic-net:master Oct 22, 2021
@vaibhavhd vaibhavhd deleted the timeout-advance-reboot branch October 22, 2021 16:13
vaibhavhd added a commit that referenced this pull request Oct 29, 2021
The failures seen in ptf (tests started by ptf_runner) are difficult to examine. Make it easier by bringing out the exceptions and print the traceback message.

Also, make the exeception handling generic in the ptf_runner. Earlier in the PR #4532 similar try-catch was added only for advanced_reboot cases.
selldinesh pushed a commit to selldinesh/sonic-mgmt that referenced this pull request Mar 22, 2022
…onic-net#5365)

Advanced-reboot testcases sometimes gets hung inside ptf.

This was earlier handled as part of sonic-net#4532
Although PR 4532 sets a timeout for PTF test, the inbuilt --test-case-timeout interrupts the advanced-reboot test as expected, but sometimes that is not enough.
After the timeout interrupt, the test still proceeds to collect logs and analyze the pcap files.
In an unlikely event when the analysis logic is hung, the whole testcase stays hung indefinitely.
This commit fixes this testcase hung issue.
wangxin pushed a commit that referenced this pull request Mar 25, 2022
…5365)

Advanced-reboot testcases sometimes gets hung inside ptf.

This was earlier handled as part of #4532
Although PR 4532 sets a timeout for PTF test, the inbuilt --test-case-timeout interrupts the advanced-reboot test as expected, but sometimes that is not enough.
After the timeout interrupt, the test still proceeds to collect logs and analyze the pcap files.
In an unlikely event when the analysis logic is hung, the whole testcase stays hung indefinitely.
This commit fixes this testcase hung issue.
xwjiang-ms pushed a commit to xwjiang-ms/sonic-mgmt that referenced this pull request Apr 13, 2022
…onic-net#5365)

Advanced-reboot testcases sometimes gets hung inside ptf.

This was earlier handled as part of sonic-net#4532
Although PR 4532 sets a timeout for PTF test, the inbuilt --test-case-timeout interrupts the advanced-reboot test as expected, but sometimes that is not enough.
After the timeout interrupt, the test still proceeds to collect logs and analyze the pcap files.
In an unlikely event when the analysis logic is hung, the whole testcase stays hung indefinitely.
This commit fixes this testcase hung issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants