[FIXED] Lost sequences after hard kill in fs.removeMsgBlock of lmb #6778
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes an edge case where the stream state could reset during recovery in WorkQueue (WQ) streams.
In WQ streams, messages are removed immediately upon ACK. Once a message block becomes empty, NATS deletes the corresponding block file. After deleting the last message block (lmb), NATS also creates a new block file that carries the latest sequence and timestamp from lmb.
If NATS crashes between deleting the lmb and creating the new block file, on NATS reboot the recovery logic cannot restore the stream state due to the absence of any block files. This leads to a reset to sequence 0, potentially causing inconsistencies such as the consumer sequence being higher than the stream sequence.
This change updates the logic to first create the new block file before deleting the lmb. This ensures that there is always at least one valid block on disk, allowing for consistent recovery even if NATS is terminated unexpectedly during the transition.
PR fixes : #6600
Signed-off-by: Sourabh Agrawal [email protected]