-
Notifications
You must be signed in to change notification settings - Fork 14.4k
KAFKA-18345: Part two, remove exponential backoff for candidate state #19747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @ahuang98 .
if (candidate.epochElection().isVoteRejected() && !candidate.isBackingOff()) { | ||
logger.info( | ||
"Insufficient remaining votes to become leader. We will backoff before retrying election again. " + | ||
"Current epoch election state is {}.", | ||
"Insufficient remaining votes to become leader. Candidate will wait the remaining election " + | ||
"timeout ({}) before transitioning back to Prospective. Current epoch election state is {}.", | ||
candidate.remainingElectionTimeMs(currentTimeMs), | ||
candidate.epochElection() | ||
); | ||
// Go immediately to a random, exponential backoff. The backoff starts low to prevent | ||
// needing to wait the entire election timeout when the vote result has already been | ||
// determined. The randomness prevents the next election from being gridlocked with | ||
// another nominee due to timing. The exponential aspect limits epoch churn when the | ||
// replica has failed multiple elections in succession. | ||
candidate.startBackingOff( | ||
currentTimeMs, | ||
RaftUtil.binaryExponentialElectionBackoffMs( | ||
quorumConfig.electionBackoffMaxMs(), | ||
RETRY_BACKOFF_BASE_MS, | ||
candidate.retries(), | ||
random | ||
) | ||
); | ||
// Mark that the candidate is now backing off. After election timeout expires the candidate will | ||
// transition back to prospective | ||
candidate.startBackingOff(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we recored the isBackingOff
state simply so we can log this message at most once per epoch. Maybe we can remove this state if we change EpochElection#recordVote
so that it returns true if the vote resulted in the election getting granted or rejected.
The other option is to log this message multiple times. Once for each vote response after the election was lost. I am okay with this since election should be infrequent and election are throttle by the election timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with the latter option - I'm a bit reluctant to change the return type of the recordVote methods since it would affect a lot of existing tests.
} else if (state.isBackingOff()) { | ||
return state.remainingElectionTimeMs(currentTimeMs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? KRaft already takes the min of the request timeout and the election timeout in line 3150.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it also sends another vote request - which we don't need to do if the vote is already lost
// Note that we reset the election timeout after voting for a candidate because we | ||
// know that the candidate has at least as good of a chance of getting elected as us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside the scope of this PR but this is not true when there is a network partition. We should not reset the election timer if a vote is granted. The worst case, if the replica gets this wrong, is that the replica transitions back to the follower or unattached state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, right, we no longer need to reset election timeouts because expiry of prospective election timeouts are non-disruptive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/KAFKA-19319
would be a good task for a raft newbie
Replaces exponential backoff for candidate state after losing election with waiting rest of election timeout.
We don't need an exponential backoff after PreVote was implemented since candidates transition to prospective after losing an election.
Replicas also enter Prospective state and Candidate state with a random election backoff which is sufficient to prevent livelocked elections.