Skip to content

KAFKA-18345: Part two, remove exponential backoff for candidate state #19747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

ahuang98
Copy link
Contributor

Replaces exponential backoff for candidate state after losing election with waiting rest of election timeout.
We don't need an exponential backoff after PreVote was implemented since candidates transition to prospective after losing an election.
Replicas also enter Prospective state and Candidate state with a random election backoff which is sufficient to prevent livelocked elections.

@github-actions github-actions bot added triage PRs from the community kraft labels May 16, 2025
@github-actions github-actions bot removed the triage PRs from the community label May 20, 2025
Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @ahuang98 .

Comment on lines 1021 to 1030
if (candidate.epochElection().isVoteRejected() && !candidate.isBackingOff()) {
logger.info(
"Insufficient remaining votes to become leader. We will backoff before retrying election again. " +
"Current epoch election state is {}.",
"Insufficient remaining votes to become leader. Candidate will wait the remaining election " +
"timeout ({}) before transitioning back to Prospective. Current epoch election state is {}.",
candidate.remainingElectionTimeMs(currentTimeMs),
candidate.epochElection()
);
// Go immediately to a random, exponential backoff. The backoff starts low to prevent
// needing to wait the entire election timeout when the vote result has already been
// determined. The randomness prevents the next election from being gridlocked with
// another nominee due to timing. The exponential aspect limits epoch churn when the
// replica has failed multiple elections in succession.
candidate.startBackingOff(
currentTimeMs,
RaftUtil.binaryExponentialElectionBackoffMs(
quorumConfig.electionBackoffMaxMs(),
RETRY_BACKOFF_BASE_MS,
candidate.retries(),
random
)
);
// Mark that the candidate is now backing off. After election timeout expires the candidate will
// transition back to prospective
candidate.startBackingOff();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we recored the isBackingOff state simply so we can log this message at most once per epoch. Maybe we can remove this state if we change EpochElection#recordVote so that it returns true if the vote resulted in the election getting granted or rejected.

The other option is to log this message multiple times. Once for each vote response after the election was lost. I am okay with this since election should be infrequent and election are throttle by the election timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with the latter option - I'm a bit reluctant to change the return type of the recordVote methods since it would affect a lot of existing tests.

Comment on lines 3146 to 3147
} else if (state.isBackingOff()) {
return state.remainingElectionTimeMs(currentTimeMs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? KRaft already takes the min of the request timeout and the election timeout in line 3150.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it also sends another vote request - which we don't need to do if the vote is already lost

Comment on lines 508 to 509
// Note that we reset the election timeout after voting for a candidate because we
// know that the candidate has at least as good of a chance of getting elected as us
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this PR but this is not true when there is a network partition. We should not reset the election timer if a vote is granted. The worst case, if the replica gets this wrong, is that the replica transitions back to the follower or unattached state.

Copy link
Contributor Author

@ahuang98 ahuang98 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, right, we no longer need to reset election timeouts because expiry of prospective election timeouts are non-disruptive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/KAFKA-19319
would be a good task for a raft newbie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants