Skip to content

Fix PFC_WD test #479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ansible/roles/test/tasks/pfc_wd/choose_test_port.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
- set_fact:
pfc_wd_available_port_ids: "{{range(16) | list | shuffle}}"
pfc_wd_available_port_ids: "{{range(16,32) | list | shuffle}}"
when: testbed_type == "t1-lag"

- set_fact:
Expand Down
2 changes: 2 additions & 0 deletions ansible/roles/test/tasks/pfc_wd/config_test/config_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@
- name: Config tests - Check forward action configuration.
vars:
command_to_run: "sonic-cfggen -j {{ run_dir }}/pfc_wd_fwd_action.json --write-to-db"
test_ignore_file: config_test_ignore_messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use set fact to set global variable of test_ignore_file as it will be used across the test, including those wrong config tests to be expected having errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this only for particular tests cases
setting global variable can mask some potential errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand what cases don't need this within config_test.yml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the exceptions I've added to config_test_ignore_messages are basically related to counters discovery errors. When syncd(orchagent?) is trying to find out which counters are available it queries all one by one and SAI reports whether particular counter is available. Unfortunately when counter is not supported/implemented/etc SAI also prints an error to the log and LogAnalyzer treats this as test failure.
It is possible that SAI implementation on other platforms does not print such messages (or maybe all counters are available/implemented?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you as these are unrelated errors. From my understanding, these are unrelated errors in all cases in PFCWD tests and we might need to ignore them whenever they happen.

errors_expected: false
include: roles/test/tasks/run_command_with_log_analyzer.yml

Expand Down Expand Up @@ -174,6 +175,7 @@
- name: Clean up config
vars:
command_to_run: "pfcwd stop"
test_ignore_file: config_test_ignore_messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

errors_expected: false
include: roles/test/tasks/run_command_with_log_analyzer.yml

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
r, ".* Port counter .* not implemented"
r, ".* Port counter .* not supported"
r, ".* Invalid port counter .*"
r, ".* Trying to remove nonexisting queue from flex counter .*"
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
set_fact:
pfc_wd_detect_time: 200
pfc_wd_restore_time: 200
pfc_wd_restore_time_large: 30000
pfc_wd_restore_time_large: 50000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current test only run for one port and 50s is fine. Since we are going to iterate all the ports, waiting for such long time will make test run for hours as each port will wait around 2mins simply for this every time. We will have to make this parameter small in future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value of 50s was approved by @marian-pritsak :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

50s is too large for iterate all the ports as above reason. We could set it to 50s for now. Please be aware that we need change it to around 3-5s once we enable the test over all the ports @marian-pritsak

pfc_wd_poll_time: 100

- set_fact:
Expand Down Expand Up @@ -258,6 +258,7 @@
- name: Apply drop config to {{ pfc_wd_test_port }}.
vars:
command_to_run: "pfcwd start --action drop --restoration-time {{ pfc_wd_restore_time_large }} {{ ports }} {{ pfc_wd_detect_time }}"
test_ignore_file: ignore_pfc_wd_messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to set to global variable?

errors_expected: false
include: roles/test/tasks/run_command_with_log_analyzer.yml

Expand Down Expand Up @@ -433,6 +434,7 @@
- name: Clean up config
vars:
command_to_run: "pfcwd stop"
test_ignore_file: ignore_pfc_wd_messages
errors_expected: false
include: roles/test/tasks/run_command_with_log_analyzer.yml

Expand Down Expand Up @@ -602,6 +604,7 @@
- name: Apply config with proper timers to {{ pfc_wd_test_port }}.
vars:
command_to_run: "pfcwd start --action drop --restoration-time {{ pfc_wd_restore_time }} {{ ports }} {{ pfc_wd_detect_time }}"
test_ignore_file: ignore_pfc_wd_messages
errors_expected: false
include: roles/test/tasks/run_command_with_log_analyzer.yml

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
r, ".* Port counter .* not implemented"
r, ".* Port counter .* not supported"
r, ".* Invalid port counter .*"
r, ".* Trying to remove nonexisting queue from flex counter .*"
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
register: storm_end_millis

- name: Find PFC storm restore message
shell: grep "restored from PFC storm" /var/log/syslog
shell: grep "PFC Watchdog storm restored" /var/log/syslog
register: storm_restore

- name: Convert restore message time to milliseconds
Expand Down