Fix SendAsync won't be timeout during producer reconnection #1356

RobertIndie · 2025-04-18T09:39:01Z

Motivation

The root cause is that the producer event loops become busy during reconnection, preventing messages in dataChan from timing out. And the ctx of the SendAsync won't be respected in this case.

SendAsync can wait until the runEventLoop processes it and pushes it into the pendingQueue or a batch, just like the Java client. Before entering the pendingQueue, SendAsync itself can check for timeouts and handle the callback. After entering the pendingQueue, failTimeoutMessages can manage the timeout.

Modifications

Introduced a new channel enqueued to make SendAsync wait until the sendRequest is added to the pending queue.
Use ctx to check for timeouts and invoke the callback if a timeout occurs.

Verifying this change

The test TestSendAsyncCouldTimeoutWhileReconnecting is based on test from this PR: #1345

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API: (yes / no)
The schema: (yes / no / don't know)
The default values of configurations: (yes / no)
The wire protocol: (yes / no)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / GoDocs / not documented)
If a feature is not applicable for documentation, explain why?
If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

Copilot

Pull Request Overview

This PR fixes an issue where SendAsync may not time out during producer reconnection by introducing a new channel (enqueued) to signal when send requests are added to the pending queue. It also adds tests to validate timeout behavior in reconnection scenarios and when the pending queue is full.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
pulsar/producer_test.go	Added tests to verify SendAsync timeout behavior.
pulsar/producer_partition.go	Updated producer logic to include and utilize the new enqueued channel.

Comments suppressed due to low confidence (1)

pulsar/producer_test.go:2705

Avoid using fixed sleeps for synchronization in tests as they can lead to flaky results; consider using a more robust waiting mechanism (e.g., require.Eventually) to ensure the pending queue is properly filled before proceeding.

time.Sleep(3 * time.Second)

pulsar/producer_partition.go

gunli · 2025-04-27T03:14:16Z

pulsar/producer_partition.go

+	select {
+	case <-sr.enqueued:
+	case <-ctx.Done():
+		err := ctx.Err()


Generally, timeout is not set to the ctx provided by application/user, may be we should update the ctx with context.WithTimeout(ctx, config.timeout)?

gunli · 2025-04-27T03:23:42Z

pulsar/producer_partition.go

@@ -1353,6 +1367,17 @@ func (p *partitionProducer) internalSendAsync(
 	}

 	p.dataChan <- sr
+	select {


the select here will block SendAsync() when producer is in reconnecting while the pengding queue is not full, it is not good for those latency-sensitive applications such as game.

Fix SendAsync won't be timeout during producer reconnection

3807a17

RobertIndie added the type/bug label Apr 18, 2025

RobertIndie requested a review from Copilot April 18, 2025 09:39

RobertIndie self-assigned this Apr 18, 2025

Copilot AI reviewed Apr 18, 2025

View reviewed changes

pulsar/producer_partition.go Outdated Show resolved Hide resolved

Fix lint

24cc7e6

RobertIndie marked this pull request as draft April 18, 2025 09:57

Fix sendRequest trigger enqueued for batch

7ab7eb2

RobertIndie marked this pull request as ready for review April 18, 2025 12:03

gunli reviewed Apr 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SendAsync won't be timeout during producer reconnection #1356

Fix SendAsync won't be timeout during producer reconnection #1356

Uh oh!

RobertIndie commented Apr 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gunli Apr 27, 2025

Uh oh!

gunli Apr 27, 2025

Uh oh!

Uh oh!

Fix SendAsync won't be timeout during producer reconnection #1356

Are you sure you want to change the base?

Fix SendAsync won't be timeout during producer reconnection #1356

Uh oh!

Conversation

RobertIndie commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

gunli Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

gunli Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RobertIndie commented Apr 18, 2025 •

edited

Loading