-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[fix][txn] Fix deadlock when loading transaction buffer snapshot #24401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
BewareMyPower
merged 4 commits into
apache:master
from
BewareMyPower:bewaremypower/fix-txn-snapshot-block
Jun 11, 2025
Merged
[fix][txn] Fix deadlock when loading transaction buffer snapshot #24401
BewareMyPower
merged 4 commits into
apache:master
from
BewareMyPower:bewaremypower/fix-txn-snapshot-block
Jun 11, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 tasks
coderzc
approved these changes
Jun 11, 2025
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #24401 +/- ##
============================================
+ Coverage 73.57% 74.24% +0.66%
- Complexity 32624 32675 +51
============================================
Files 1877 1867 -10
Lines 139502 145308 +5806
Branches 15299 16615 +1316
============================================
+ Hits 102638 107879 +5241
+ Misses 28908 28878 -30
- Partials 7956 8551 +595
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
BewareMyPower
added a commit
that referenced
this pull request
Jun 18, 2025
) (cherry picked from commit 1b7e4a7)
BewareMyPower
added a commit
that referenced
this pull request
Jun 18, 2025
) (cherry picked from commit 1b7e4a7)
ganesh-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
Jun 19, 2025
…che#24401) (cherry picked from commit 1b7e4a7) (cherry picked from commit 748845f)
lhotari
pushed a commit
that referenced
this pull request
Jun 19, 2025
) (cherry picked from commit 1b7e4a7)
nodece
pushed a commit
to nodece/pulsar
that referenced
this pull request
Jun 20, 2025
…che#24401) (cherry picked from commit 1b7e4a7)
ganesh-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
Jun 21, 2025
…che#24401) (cherry picked from commit 1b7e4a7) (cherry picked from commit 27e27d3)
srinath-ctds
pushed a commit
to datastax/pulsar
that referenced
this pull request
Jun 24, 2025
…che#24401) (cherry picked from commit 1b7e4a7) (cherry picked from commit 27e27d3)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/transaction
cherry-picked/branch-3.0
cherry-picked/branch-3.3
cherry-picked/branch-4.0
doc-not-needed
Your PR changes do not impact docs
ready-to-test
release/3.0.12
release/3.3.8
release/4.0.6
type/bug
The PR fixed a bug or issue reported a bug
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes apache/pulsar-client-go#1376
Motivation
#23062 introduces a possible deadlock.
transactionExecutorProvider
is actually used by two different places:TopicTransactionBuffer
PendingAckHandleImpl
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java
Line 244 in e0d7faa
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/transaction/pendingack/impl/PendingAckHandleImpl.java
Line 175 in e0d7faa
The future of
PendingAckHandleImpl
is completed in this executor, while the executor might be executing the blocking snapshot replay task. When there is only 1 thread intransactionExecutorProvider
, the reader requires the dispatcher for messages, while thePendingAckHandleImpl
object cannot complete its future because thetransaction-executor
thread is occupied.There is another bug that when the reader fails with a non-retriable error, it will still exist in the cache at least for 60 seconds. See
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/transaction/buffer/impl/TableView.java
Line 45 in e0d7faa
Modifications
Reduce
numTransactionReplayThreadPoolSize
andmanagedLedgerNumSchedulerThreads
to 1. After that,TransactionTest
will always fail attestCreateTransactionSystemTopic
when creating a consumerThere are two major fixes:
Fail
and fail thegetLastMessageId
RPC immediately after that. InTableView#readLatest
, remove the reader from cache ifreadToLatest
fail.With the 1st fix,
testCreateTransactionSystemTopic
will succeed butTransactionTest
will still fail intestDeleteNamespace
. It's because the previoustestCreateTransactionSystemTopic
recreated thetnx/ns1
namespace but the reader will keep reconnecting withFail
state to get the last message id. The 2nd fix will fix it.Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: