-
Notifications
You must be signed in to change notification settings - Fork 711
[warm-reboot] remove ISSU bank check #2958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. The file in which SDK holds ISSU bank information has changed and is no longer issu_bank.txt 2. We shouldn't check the SDK dump content from SONiC as that information and its format is constantly changing 3. ISSU bank information is written by pre_shutdown request anyway. Verifying whether the information was indeed written right after that seems redundant. Signed-off-by: Stepan Blyschak <[email protected]>
@@ -202,34 +202,6 @@ function request_pre_shutdown() | |||
fi | |||
} | |||
|
|||
function recover_issu_bank_file() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the paths in the current check are not relevant as the issu bank file format has changed.
However, I am not clear on why we do not change file names to use new format, rather than remove the check altogether.
From our discussion - this issue is not seen so far on newer branches, but is that a guaranee that the issue will never be seen again? If yes, then how is that guaranteed?
What is the downside of keeping this change w/ new file format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are concerned about the auto-fix ultimately breaking something, then do you rather want to detect a missing/broken bank file, and abort warmboot (do the check before set +e
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I am not clear on why we do not change file names to use new format, rather than remove the check altogether. From our discussion - this issue is not seen so far on newer branches, but is that a guaranee that the issue will never be seen again? If yes, then how is that guaranteed?
As far as I remember the issu_bank.txt got corrupted due to unknown reasons. Even with this check I don't see a guarantee issu_bank.txt won't get corrupted after the check and recovery.
See the original PR - #1466, the way to reproduce it is modify warm-reboot script to artificially corrupt it.
What is the downside of keeping this change w/ new file format?
We, in SONiC, have to track changes SDK makes to it's files. And there is not just a single file, there are more. Not all of them can be restored from SONiC. It could also happen that SDK changes the format of the file and SONiC overwrites it to an old format leading to a failed warm boot.
If you are concerned about the auto-fix ultimately breaking something, then do you rather want to detect a missing/broken bank file, and abort warmboot (do the check before
set +e
)?
I would then ask SDK to do the check. But then, the logic they have to implement is - write the content and read it to make sure it is written. Seems redundant. Again, does not guarantee files corruption afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would then ask SDK to do the check. But then, the logic they have to implement is - write the content and read it to make sure it is written. Seems redundant. Again, does not guarantee files corruption afterwards.
I did not follow this part completely. IMO, this is not redundant if it works in some situations and detect corrupted files to avoid warmboot. This has happened before in older images, and can potentially happen now too. Aborting warm reboot will prevent any risk of dataplane downtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR to be kept on master, and not to be backported to older branches.
### What I did I removed issu bank check: 1. The file in which SDK holds ISSU bank information has changed and is no longer issu_bank.txt 2. We shouldn't check the SDK dump content from SONiC as that information and its format is constantly changing. Overwriting it might be risky when SONiC and SDK are out of sync regarding the format of the file 3. ISSU bank information is written by pre_shutdown request anyway. Verifying whether the information was indeed written right after that seems redundant 4. The check is made after ```set +e```, so at the moment we "detect" a problem it is already too late #### How I did it Removed the relevant code. #### How to verify it Run warm-reboot on device.
What I did
I removed issu bank check:
set +e
, so at the moment we "detect" a problem it is already too lateHow I did it
Removed the relevant code.
How to verify it
Run warm-reboot on device.
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)