KAFKA-18345: Part two, remove exponential backoff for candidate state #19747

ahuang98 · 2025-05-16T23:47:41Z

Replaces exponential backoff for candidate state after losing election with waiting rest of election timeout.
We don't need an exponential backoff after PreVote was implemented since candidates transition to prospective after losing an election.
Replicas also enter Prospective state and Candidate state with a random election backoff which is sufficient to prevent livelocked elections.

raft/src/main/java/org/apache/kafka/raft/CandidateState.java

raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientTest.java

jsancio

Thanks for the changes @ahuang98 .

jsancio · 2025-05-20T15:49:10Z

raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java

            if (candidate.epochElection().isVoteRejected() && !candidate.isBackingOff()) {
                logger.info(
-                    "Insufficient remaining votes to become leader. We will backoff before retrying election again. " +
-                    "Current epoch election state is {}.",
+                    "Insufficient remaining votes to become leader. Candidate will wait the remaining election " +
+                        "timeout ({}) before transitioning back to Prospective. Current epoch election state is {}.",
+                    candidate.remainingElectionTimeMs(currentTimeMs),
                    candidate.epochElection()
                );
-                // Go immediately to a random, exponential backoff. The backoff starts low to prevent
-                // needing to wait the entire election timeout when the vote result has already been
-                // determined. The randomness prevents the next election from being gridlocked with
-                // another nominee due to timing. The exponential aspect limits epoch churn when the
-                // replica has failed multiple elections in succession.
-                candidate.startBackingOff(
-                    currentTimeMs,
-                    RaftUtil.binaryExponentialElectionBackoffMs(
-                        quorumConfig.electionBackoffMaxMs(),
-                        RETRY_BACKOFF_BASE_MS,
-                        candidate.retries(),
-                        random
-                    )
-                );
+                // Mark that the candidate is now backing off. After election timeout expires the candidate will
+                // transition back to prospective
+                candidate.startBackingOff();


It looks like we recored the isBackingOff state simply so we can log this message at most once per epoch. Maybe we can remove this state if we change EpochElection#recordVote so that it returns true if the vote resulted in the election getting granted or rejected.

The other option is to log this message multiple times. Once for each vote response after the election was lost. I am okay with this since election should be infrequent and election are throttle by the election timeout.

I'll go with the latter option - I'm a bit reluctant to change the return type of the recordVote methods since it would affect a lot of existing tests.

jsancio · 2025-05-20T15:50:36Z

raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java

+        } else if (state.isBackingOff()) {
+            return state.remainingElectionTimeMs(currentTimeMs);


Why do you need this? KRaft already takes the min of the request timeout and the election timeout in line 3150.

but it also sends another vote request - which we don't need to do if the vote is already lost

jsancio · 2025-05-20T16:03:25Z

raft/src/main/java/org/apache/kafka/raft/QuorumState.java

        // Note that we reset the election timeout after voting for a candidate because we
        // know that the candidate has at least as good of a chance of getting elected as us


Outside the scope of this PR but this is not true when there is a network partition. We should not reset the election timer if a vote is granted. The worst case, if the replica gets this wrong, is that the replica transitions back to the follower or unattached state.

hm, right, we no longer need to reset election timeouts because expiry of prospective election timeouts are non-disruptive

https://issues.apache.org/jira/browse/KAFKA-19319
would be a good task for a raft newbie

remove exponential backoff for candidate

c009d3a

github-actions bot added triage PRs from the community kraft labels May 16, 2025

m1a2st added the ci-approved label May 17, 2025

ahuang98 commented May 19, 2025

View reviewed changes

raft/src/main/java/org/apache/kafka/raft/CandidateState.java Outdated Show resolved Hide resolved

ahuang98 commented May 19, 2025

View reviewed changes

raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientTest.java Outdated Show resolved Hide resolved

nits

66c74db

TaiJuWu approved these changes May 19, 2025

View reviewed changes

github-actions bot removed the triage PRs from the community label May 20, 2025

jsancio reviewed May 20, 2025

View reviewed changes

addressing comments

1dedb63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-18345: Part two, remove exponential backoff for candidate state #19747

KAFKA-18345: Part two, remove exponential backoff for candidate state #19747

ahuang98 commented May 16, 2025

jsancio left a comment

jsancio May 20, 2025

ahuang98 May 20, 2025

jsancio May 20, 2025

ahuang98 May 20, 2025

jsancio May 20, 2025

ahuang98 May 20, 2025 •

edited

Loading

ahuang98 May 20, 2025

		} else if (state.isBackingOff()) {
		return state.remainingElectionTimeMs(currentTimeMs);

		// Note that we reset the election timeout after voting for a candidate because we
		// know that the candidate has at least as good of a chance of getting elected as us

KAFKA-18345: Part two, remove exponential backoff for candidate state #19747

Are you sure you want to change the base?

KAFKA-18345: Part two, remove exponential backoff for candidate state #19747

Conversation

ahuang98 commented May 16, 2025

jsancio left a comment

Choose a reason for hiding this comment

jsancio May 20, 2025

Choose a reason for hiding this comment

ahuang98 May 20, 2025

Choose a reason for hiding this comment

jsancio May 20, 2025

Choose a reason for hiding this comment

ahuang98 May 20, 2025

Choose a reason for hiding this comment

jsancio May 20, 2025

Choose a reason for hiding this comment

ahuang98 May 20, 2025 • edited Loading

Choose a reason for hiding this comment

ahuang98 May 20, 2025

Choose a reason for hiding this comment

ahuang98 May 20, 2025 •

edited

Loading