-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit 826311c
authored
[techsupport] Removed interactive option for docker commands and Improved Error Reporting (#1723)
#### Why I did
Recently, a bug was seen which was related to saisdkdump and particularly showed up when `show techsupport` was invoked. Although, it was fixed, the sonic-mgmt test failed to capture it beforehand.
This highlighted a few shortcomings of the `generate_dump` script and this PR addresses those and also a few additional issues seen
This PR fixes a few things, I'll explain each of them in the next section.
#### What I did
**1) Remove the "Interactive option (-i) for the docker invocation commands"**
This was the reason why the bug which was was not captured previously. When the techsupport was invoked remotely (Eg: using sshpass), the `docker exec -it <docker> <cmd>` command would fail saying ` ‘the input device is not a TTY'`. Hence the (-i) option was removed.
**2) Change the Return Code**
Currently, the script doesn't return any non-zero error codes for most of the intermediate steps (even though they fail), which makes validation hard.
To handle this, a helper function and trap cmd are used.
```
handle_error() {
if [ "$1" != "0" ]; then
echo "ERR: RC:-$1 observed on line $2" >&2
RETURN_CODE=1
fi
}
trap 'handle_error $? $LINENO' ERR # This would trap any executions with non-zero return codes
```
The global variable RETURN_CODE is set when this is called and the same RETURN_CODE is returned when `generate_dump` invocation process exits
You may see this is used in multiple functions instead of placing it once on the top of the script. This is because, every function can itself be considered as a subshell and each of them requires a explicit trap command.
When a command is failed with error, this logic would get append a corresponding log to stderr.
`ERR: RC:-1 observed on line 729`
**3) Improve Error Reporting for save_cmd function**
Currently any error written to the stderr by the intermediate calls are redirected to the same location as stdout, which is usually the file we see under dump/ dir. This is perfectly fine, but the sonic-mgmt test only parses the text seen in stdout.
So, a new option (-r) is added to `generate_dump` script and subsequently to `show techsupport` to redirect any intermediate errors seen to the generate_dump's stderr.
With this option enabled, these sort of errors can be captured on the stderr.
```
root@sonic:/home/admin# show techsupport -r
..........
timeout --foreground 5m show queue counters > /var/dump/sonic_dump_r-tigon-04_20210714_062239/dump/queue.counters_1
Traceback (most recent call last):
File "/usr/local/bin/queuestat", line 373, in <module>
main()
File "/usr/local/bin/queuestat", line 368, in main
queuestat.get_print_all_stat(json_opt)
File "/usr/local/bin/queuestat", line 239, in get_print_all_stat
cnstat_dict = self.get_cnstat(self.port_queues_map[port])
File "/usr/local/bin/queuestat", line 168, in get_cnstat
cnstat_dict[queue] = get_counters(queue_map[queue])
File "/usr/local/bin/queuestat", line 158, in get_counters
fields[pos] = str(int(counter_data))
ValueError: invalid literal for int() with base 10: ''
handle_error $? $LINENO
ERR: RC:-1 observed on line 199
Command: show queue counters timedout after 5 minutes.
.............
Without that option, this'll be the output seen
root@sonic:/home/admin# show techsupport
..........
timeout --foreground 5m show queue counters &> /var/dump/sonic_dump_r-tigon-04_20210714_062239/dump/queue.counters_1
handle_error $? $LINENO
ERR: RC:-1 observed on line 199
Command: show queue counters timedout after 5 minutes.
.............
```
**4) Minor Error in sdk-dump collection logic handled**
save_file is only called for the files seen in sdk_dump_path and not for directories
```
cp: -r not specified; omitting directory '/tmp/sdk-dumps'
handle_error $? $LINENO
ERR: RC:-1 observed on line 729
tar: sonic_dump_r-tigon-04_20210714_062239/sai_sdk_dump/sdk-dumps: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
tar append operation failed. Aborting to prevent data loss.
```
The reason being, `find /tmp/sdk-dumps` returns ["/tmp/sdk-dumps"] even if the dir is empty. In the next step, save_file cmd is applied on the folder and thus the error. This can be handled by the change specified above
**5) Minor Error in custom plugins logic handled**
Added a condition to check if the dir exists before proceeding forward.
```
if [[ -d ${PLUGINS_DIR} ]]; then
local -r dump_plugins="$(find ${PLUGINS_DIR} -type f -executable)"
for plugin in $dump_plugins; do
# save stdout output of plugin and gzip it
save_cmd "$plugin" "$(basename $plugin)" true false
done
fi
```
Otherwise, find command might fail saying
```
root@sonic:/home/admin# find /usr/local/bin/debug-dump -type f -executable
find: ‘/usr/local/bin/debug-dump’: No such file or directory
```1 parent ce11545 commit 826311cCopy full SHA for 826311c
2 files changed
+85
-25
lines changed
0 commit comments