You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[warm-reboot] ERR swss#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). after performing warm-boot command
#16686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
The following error message is seen when performing warmboot command. This is due to the recent watchdog introduced to monitor orchagent in #15429. However this should be disabled when executing warmboot or fastboot commands.
Sep 7 18:46:12.366130 r-anaconda-51 NOTICE swss#orchagent: :- setAgingFDB: Set switch 21000000000000 fdb_aging_time 0 sec
Sep 7 18:46:12.366130 r-anaconda-51 INFO swss#orchagent: :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS
Sep 7 18:46:12.366130 r-anaconda-51 WARNING swss#orchagent: :- start: Orchagent is frozen for warm restart!
**Sep 7 18:47:07.639517 r-anaconda-51 ERR swss#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).**
Sep 7 18:47:30.581322 r-anaconda-51 INFO systemd[1]: Stopping switch state service...
Sep 7 18:47:30.640958 r-anaconda-51 NOTICE root: Stopping swss service...
Sep 7 18:47:30.645693 r-anaconda-51 NOTICE root: Locking /tmp/swss-syncd-lock from swss service
Sep 7 18:47:30.651871 r-anaconda-51 NOTICE root: Locked /tmp/swss-syncd-lock (10) from swss service
Sep 7 18:47:30.673126 r-anaconda-51 NOTICE root: Warm boot flag: swss true.
Sep 7 18:47:30.686321 r-anaconda-51 NOTICE root: Fast boot flag: swss false.
Sep 7 18:47:30.690974 r-anaconda-51 NOTICE root: Killing Docker swss...
Orchangent send heartbeat during warm-reboot to prevent Orchagent stuck alert.
Why I did it
Orchangent will freese during warm-reboot, then supervisor-proc-exit-listener will generate false alert during warm reboot:
sonic-net/sonic-buildimage#16686
Work item tracking
Microsoft ADO: 25295846
How I did it
Send heartbeat during warm-reboot freeze.
How to verify it
Pass all UT.
Manually verify issue fixed by check syslog.
liuh-80
added a commit
to sonic-net/sonic-mgmt
that referenced
this issue
Nov 17, 2023
Add orchagent heartbeat during warm-reboot UT
### Description of PR
Add orchagent heartbeat during warm-reboot UT
##### Work item tracking
- Microsoft ADO: 25295846
### Type of change
<!--
- Fill x for your type of change.
- e.g.
- [x] Bug fix
-->
- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [x] Test case(new/improvement)
### Back port request
- [ ] 201911
- [ ] 202012
- [ ] 202205
### Approach
#### What is the motivation for this PR?
Fix orchagent stuck error during warm-reboot:
sonic-net/sonic-buildimage#16686
#### How did you do it?
Add new UT, freeze orchanget for warm-reboot then check the process listener not send alert.
#### How did you verify/test it?
Pass all UT
#### Any platform specific information?
#### Supported testbed topology if it's a new test case?
### Documentation
<!--
(If it's a new feature, new test case)
Did you update documentation/Wiki relevant to your implementation?
Link to the wiki page?
-->
Description
The following error message is seen when performing warmboot command. This is due to the recent watchdog introduced to monitor orchagent in #15429. However this should be disabled when executing warmboot or fastboot commands.
Steps to reproduce the issue:
Describe the results you received:
Error in logs
Describe the results you expected:
No error in logs.
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
sonic_dump_r-anaconda-51_20230907_185038.tar.gz
The text was updated successfully, but these errors were encountered: