-
Notifications
You must be signed in to change notification settings - Fork 377
INDY-1545: stop processing checkpoints in view change #854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDY-1545: stop processing checkpoints in view change #854
Conversation
Signed-off-by: toktar <[email protected]>
Changes: - add bugfix for view_no in checkpoint cleaning. Choose view_no from last_ordered_3pc - update test test_checkpoints_removal_in_view_change.py Signed-off-by: toktar <[email protected]>
…nto bugfix-1545-checkponts-cleaning-preprepares
Signed-off-by: toktar <[email protected]>
8c20ef3
to
e5c3be7
Compare
Changes: - update checks - update comments Signed-off-by: toktar <[email protected]>
plenum/server/replica.py
Outdated
@@ -1883,8 +1883,9 @@ def markCheckPointStable(self, seqNo): | |||
for k in previousCheckpoints: | |||
self.logger.trace("{} removing previous checkpoint {}".format(self, k)) | |||
self.checkpoints.pop(k) | |||
self._remove_stashed_checkpoints(till_3pc_key=(self.viewNo, seqNo)) | |||
self._gc((self.viewNo, seqNo)) | |||
view_no = self.last_ordered_3pc[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark checkpoint stable if we don't order anything? In this case last_ordered can be much below than the stable checkpoint's seqNo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case reproduce when node in catchup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We clean 3pc messages for transactions ordered in catchup in the method _remove_till_caught_up_3pc in catchup
indy-plenum/plenum/server/replica.py
Lines 2557 to 2579 in 3641daf
def _remove_till_caught_up_3pc(self, last_caught_up_3PC): | |
""" | |
Remove any 3 phase messages till the last ordered key and also remove | |
any corresponding request keys | |
""" | |
outdated_pre_prepares = {} | |
for key, pp in self.prePrepares.items(): | |
if compare_3PC_keys(key, last_caught_up_3PC) >= 0: | |
outdated_pre_prepares[key] = pp | |
for key, pp in self.sentPrePrepares.items(): | |
if compare_3PC_keys(key, last_caught_up_3PC) >= 0: | |
outdated_pre_prepares[key] = pp | |
self.logger.trace('{} going to remove messages for {} 3PC keys'.format( | |
self, len(outdated_pre_prepares))) | |
for key, pp in outdated_pre_prepares.items(): | |
self.batches.pop(key, None) | |
self.sentPrePrepares.pop(key, None) | |
self.prePrepares.pop(key, None) | |
self.prepares.pop(key, None) | |
self.commits.pop(key, None) | |
self._discard_ordered_req_keys(pp) |
fast_nodes, | ||
(0, CHK_FREQ + 1))) | ||
# check that fast_nodes finalized first checkpoint | ||
looper.run(eventually(check_checkpoint_finish, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that checkpoint is not finalized for slow_nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for n in nodes: | ||
assert (start_pp_seq_no, end_pp_seq_no) in n.master_replica.checkpoints | ||
ckState = n.master_replica.checkpoints[(start_pp_seq_no, end_pp_seq_no)] | ||
assert n.master_replica.quorums.checkpoint.is_reached(len(ckState.receivedDigests)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check isStable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: toktar <[email protected]>
Signed-off-by: toktar <[email protected]>
Signed-off-by: toktar <[email protected]>
@@ -1782,6 +1782,12 @@ def processCheckpoint(self, msg: Checkpoint, sender: str) -> bool: | |||
self.discard(msg, reason="Checkpoint already stable", logMethod=self.logger.debug) | |||
return True | |||
|
|||
if self.isPrimary is None and msg.viewNo < self.viewNo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a check self.isPrimary is None
?
Fix that a checkpoints finalize cleans necessary data from requests and 3pc queues in view change before catchup.
Changes: