Skip to content

Improve ptf_runner exception and timeout handling #4586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 29, 2021

Conversation

vaibhavhd
Copy link
Contributor

Description of PR

Summary: The failures seen in ptf (tests started by ptf_runner) are difficult to examine. Make it easier by bringing out the exceptions and print the traceback message.

Also, make the exeception handling generic in the ptf_runner. Earlier in the PR #4532 similar try-catch was added only for advanced_reboot cases.
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 201911

Approach

What is the motivation for this PR?

How did you do it?

Make a generic change to catch timeout and other exceptions in the ptf_runner.

This would help bring out the error message in case of exceptions for every case that uses ptf_runner.

How did you verify/test it?

Tested on physical testbed. The calltrace of exception is now printed in the test error logs.
Example:

20:04:32 advanced_reboot.__runPtfRunner           L0545 INFO   | Run advanced-reboot ReloadTest on the PTF host
20:04:34 ptf_runner.ptf_runner                    L0057 ERROR  | Traceback (most recent call last):
  File "/var/src/sonic-mgmt-int/tests/ptf_runner.py", line 47, in ptf_runner
    result = host.shell(cmd, chdir="/root", module_ignore_errors=module_ignore_errors)
  File "/var/src/sonic-mgmt-int/tests/common/devices/base.py", line 89, in _run
    raise RunAnsibleModuleFail("run module {} failed".format(self.module_name), res)
RunAnsibleModuleFail: run module shell failed, Ansible Results =>
{
    "changed": true,
    "cmd": "ptf --test-dir ptftests advanced-reboot.ReloadTest --platform-dir ptftests --qlen=1000 --platform remote -t 'vlan_ip_range='\"'\"'{\"Vlan1000\": \"192.168.0.0/21\"}'\"'\"';preboot_oper=None;sniff_time_incr=60;ports_file='\"'\"'/tmp/ports.json'\"'\"';allow_vlan_flooding=False;nexthop_ips=['\"'\"'10.10.246.254'\"'\"', '\"'\"'FC0A::FF'\"'\"'];setup_fdb_before_test=True;arista_vms=[u'\"'\"'172.16.137.119'\"'\"', u'\"'\"'172.16.137.118'\"'\"', u'\"'\"'172.16.137.117'\"'\"', u'\"'\"'172.16.137.116'\"'\"'];reboot_limit_in_seconds=0;dut_username=u'\"'\"'admin'\"'\"';vnet_pkts='\"'\"''\"'\"';portchannel_ports_file='\"'\"'/tmp/portchannel_interfaces.json'\"'\"';dut_mac=u'\"'\"'94:8e:d3:04:bf:d8'\"'\"';lo_v6_prefix='\"'\"'fc00:1::/64'\"'\"';inboot_oper='\"'\"'routing_add:50'\"'\"';dut_hostname=u'\"'\"'10.3.146.187'\"'\"';preboot_files='\"'\"'peer_dev_info,neigh_port_info'\"'\"';vnet=False;asic_type=u'\"'\"'broadcom'\"'\"';vlan_ports_file='\"'\"'/tmp/vlan_interfaces.json'\"'\"';default_ip_range='\"'\"'192.168.64.0/18'\"'\"';reboot_type='\"'\"'warm-reboot'\"'\"';bgp_v4_v6_time_diff=40;dut_password=u'\"'\"'password'\"'\"'' --relax --debug info --log-file /tmp/advanced-reboot.ReloadTest.log --test-case-timeout 1800",
    "delta": "0:00:01.516284",
    "end": "2021-10-28 20:04:35.249273",
    "failed": true,
    "invocation": {
        "module_args": {
            "_raw_params": "ptf --test-dir ptftests advanced-reboot.ReloadTest --platform-dir ptftests --qlen=1000 --platform remote -t 'vlan_ip_range='\"'\"'{\"Vlan1000\": \"192.168.0.0/21\"}'\"'\"';preboot_oper=None;sniff_time_incr=60;ports_file='\"'\"'/tmp/ports.json'\"'\"';allow_vlan_flooding=False;nexthop_ips=['\"'\"'10.10.246.254'\"'\"', '\"'\"'FC0A::FF'\"'\"'];setup_fdb_before_test=True;arista_vms=[u'\"'\"'172.16.137.119'\"'\"', u'\"'\"'172.16.137.118'\"'\"', u'\"'\"'172.16.137.117'\"'\"', u'\"'\"'172.16.137.116'\"'\"'];reboot_limit_in_seconds=0;dut_username=u'\"'\"'admin'\"'\"';vnet_pkts='\"'\"''\"'\"';portchannel_ports_file='\"'\"'/tmp/portchannel_interfaces.json'\"'\"';dut_mac=u'\"'\"'94:8e:d3:04:bf:d8'\"'\"';lo_v6_prefix='\"'\"'fc00:1::/64'\"'\"';inboot_oper='\"'\"'routing_add:50'\"'\"';dut_hostname=u'\"'\"'10.3.146.187'\"'\"';preboot_files='\"'\"'peer_dev_info,neigh_port_info'\"'\"';vnet=False;asic_type=u'\"'\"'broadcom'\"'\"';vlan_ports_file='\"'\"'/tmp/vlan_interfaces.json'\"'\"';default_ip_range='\"'\"'192.168.64.0/18'\"'\"';reboot_type='\"'\"'warm-reboot'\"'\"';bgp_v4_v6_time_diff=40;dut_password=u'\"'\"'password'\"'\"'' --relax --debug info --log-file /tmp/advanced-reboot.ReloadTest.log --test-case-timeout 1800",
            "_uses_shell": true,
            "argv": null,
            "chdir": "/root",
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2021-10-28 20:04:33.732989",
    "stderr": "WARNING: No route found for IPv6 destination :: (no default route?)\n/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.\n  from cryptography.hazmat.backends import default_backend\nadvanced-reboot.ReloadTest ... ERROR\n\n======================================================================\nERROR: advanced-reboot.ReloadTest\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File \"ptftests/advanced-reboot.py\", line 566, in setUp\n    self.init_sad_oper()\n  File \"ptftests/advanced-reboot.py\", line 478, in init_sad_oper\n    (self.ssh_targets, self.portchannel_ports, self.neigh_vm, self.vlan_ports, self.ports_per_vlan), (log_info, fails) = self.sad_handle.setup()\n  File \"ptftests/sad_path.py\", line 25, in setup\n    self.shandle.sad_setup(is_up=False)\n  File \"ptftests/sad_path.py\", line 235, in sad_setup\n    self.setup()\n  File \"ptftests/sad_path.py\", line 170, in setup\n    self.vm_connect()\n  File \"ptftests/sad_path.py\", line 132, in vm_connect\n    self.vm_handles[neigh_vm] = Arista(neigh_vm, None, self.test_args)\n  File \"ptftests/arista.py\", line 57, in __init__\n    self.show_ip_bgp_command = self.parse_supported_bgp_neighbor_command()\n  File \"ptftests/arista.py\", line 466, in parse_supported_bgp_neighbor_command\n    ip_bgp_help = self.do_cmd(help_cmd)\n  File \"ptftests/arista.py\", line 103, in do_cmd\n    self.shell.send(cmd)\nAttributeError: 'Arista' object has no attribute 'shell'\n\n----------------------------------------------------------------------\nRan 1 test in 0.001s\n\nFAILED (errors=1)",
    "stderr_lines": [
        "WARNING: No route found for IPv6 destination :: (no default route?)",
        "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py:33: CryptographyDeprecationWarning: Python 2 is no longer supported by the Python core team. Support for it is now deprecated in cryptography, and will be removed in the next release.",
        "  from cryptography.hazmat.backends import default_backend",
        "advanced-reboot.ReloadTest ... ERROR",
        "",
        "======================================================================",
        "ERROR: advanced-reboot.ReloadTest",
        "----------------------------------------------------------------------",
        "Traceback (most recent call last):",
        "  File \"ptftests/advanced-reboot.py\", line 566, in setUp",
        "    self.init_sad_oper()",
        "  File \"ptftests/advanced-reboot.py\", line 478, in init_sad_oper",
        "    (self.ssh_targets, self.portchannel_ports, self.neigh_vm, self.vlan_ports, self.ports_per_vlan), (log_info, fails) = self.sad_handle.setup()",
        "  File \"ptftests/sad_path.py\", line 25, in setup",
        "    self.shandle.sad_setup(is_up=False)",
        "  File \"ptftests/sad_path.py\", line 235, in sad_setup",
        "    self.setup()",
        "  File \"ptftests/sad_path.py\", line 170, in setup",
        "    self.vm_connect()",
        "  File \"ptftests/sad_path.py\", line 132, in vm_connect",
        "    self.vm_handles[neigh_vm] = Arista(neigh_vm, None, self.test_args)",
        "  File \"ptftests/arista.py\", line 57, in __init__",
        "    self.show_ip_bgp_command = self.parse_supported_bgp_neighbor_command()",
        "  File \"ptftests/arista.py\", line 466, in parse_supported_bgp_neighbor_command",
        "    ip_bgp_help = self.do_cmd(help_cmd)",
        "  File \"ptftests/arista.py\", line 103, in do_cmd",
        "    self.shell.send(cmd)",
        "AttributeError: 'Arista' object has no attribute 'shell'",
        "",
        "----------------------------------------------------------------------",
        "Ran 1 test in 0.001s",
        "",
        "FAILED (errors=1)"
    ],
    "stdout": "",
    "stdout_lines": []
}

Timeout errors are also handled in the same catch:

        "advanced-reboot.ReloadTest ... ERROR",
        "",
        "======================================================================",
        "ERROR: advanced-reboot.ReloadTest",
        "----------------------------------------------------------------------",
        "Traceback (most recent call last):",
        "  File \"ptftests/advanced-reboot.py\", line 566, in setUp",
        "    self.init_sad_oper()",
        "  File \"ptftests/advanced-reboot.py\", line 478, in init_sad_oper",
        "    (self.ssh_targets, self.portchannel_ports, self.neigh_vm, self.vlan_ports, self.ports_per_vlan), (log_info, fails) = self.sad_handle.setup()",
        "  File \"ptftests/sad_path.py\", line 25, in setup",
        "    self.shandle.sad_setup(is_up=False)",
        "  File \"ptftests/sad_path.py\", line 256, in sad_setup",
        "    self.vm_handles[vm].change_bgp_route(self.route_cfg)",
        "  File \"ptftests/arista.py\", line 513, in change_bgp_route",
        "    self.do_cmd(item)",
        "  File \"ptftests/arista.py\", line 106, in do_cmd",
        "    time.sleep(0.2)",
        "  File \"/usr/lib/python2.7/dist-packages/ptf/ptfutils.py\", line 102, in raise_timeout",
        "    raise Timeout.TimeoutError()",
        "TimeoutError",
        "",
        "----------------------------------------------------------------------",
        "Ran 1 test in 30.001s",
        "",
        "FAILED (errors=1)"

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@vaibhavhd vaibhavhd requested a review from a team as a code owner October 28, 2021 21:23
@vaibhavhd vaibhavhd requested a review from wangxin October 28, 2021 21:24
@vaibhavhd vaibhavhd merged commit 6a409f3 into sonic-net:master Oct 29, 2021
@vaibhavhd vaibhavhd deleted the ptf-runner-exception branch October 29, 2021 05:50
@Vickyni2
Copy link

Vickyni2 commented Jun 7, 2023

Hello @wangxin ,

We are observing the same issue with latest updated script , the case got failed in T0 physical topology . Kindly help us on this.

Build Details:
SONiC Software Version: SONiC.202211.269499-59c7d39ef
SONiC OS Version: 11
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: 59c7d39ef
Build date: Tue May 9 17:58:15 UTC 2023
Built by: AzDevOps@vmss-soni00118K

Platform: x86_64-accton_as7716_32x-r0
HwSKU: Accton-AS7716-32X
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 20:09:00 up 7 min, 1 user, load average: 0.85, 0.48, 0.22
Date: Sun 07 Aug 2022 20:09:00

Error :
try:
result = host.shell(cmd, chdir="/root", module_ignore_errors=module_ignore_errors)
if log_file:
ptf_collect(host, log_file)
if result:
allure.attach(json.dumps(result, indent=4), 'ptf_console_result', allure.attachment_type.TEXT)
if module_ignore_errors:
if result["rc"] != 0:
return result
except Exception:
if log_file:
ptf_collect(host, log_file)
traceback_msg = traceback.format_exc()
allure.attach(traceback_msg, 'ptf_runner_exception_traceback', allure.attachment_type.TEXT)
logger.error("Exception caught while executing case: {}. Error message: {}".format(testname, traceback_msg))

      raise Exception

E Exception

cmd = 'ptf --test-dir /tmp/acstests/ lag_test.LagMemberTrafficTest --platform-dir /root/ptftests --platform remote -t 'port...'"', '"'"'eth22'"'"', '"'"'eth24'"'"']};dut_mac=u'"'"'a8:2b:b5:17:d3:e1'"'"'' --relax --debug info'
custom_options = ''
debug_level = 'info'
device_sockets = []

=========================== short test summary info ============================
FAILED pc/test_lag_member.py::test_lag_member_traffic - Exception
===================== 1 failed, 1 passed in 678.20 seconds =====================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants