System recovery when syncd crashes #3517
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When syncd gets terminated unexpectedly system goes into unstable state. And system stays in that state unless a reboot is triggered by the user.
It doesn't get receovered by iteslef. This is the problem getting addressed with this change.
- What I did
- How I did it
When syncd process crashes, we cannot restart syncd alone as it has dependency on orch-agent.
So same mechanism when SAI API call failure happens will be used here. syncd process state is monitored and when it crashes, shutdown notification is been sent to orch-agent which eventually result in SWSS restart and syncd restart, to recover the system.
- How to verify it
Kill syncd daemon and check if the system recovers.
- Description for the changelog
When syncd process crashes, we cannot restart syncd alone as it has dependency on orch-agent.
syncd process state is monitored and when it crashes, shutdown notification is been sent to orch-agent which eventually result in SWSS restart and syncd restart, to recover the system.
- A picture of a cute animal (not mandatory but encouraged)