Skip to content

[improve][meta] Fix invalid use of drain API and race condition in closing metadata store #22585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 14, 2024

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Apr 25, 2024

Motivation

There's currently some memory leaks in tests and while investigating the issue, I found out that there's a large number of uncompleted CompletableFutures in the heap dump.

Currently the metadata store doesn't complete all pending operations when it is closed.
There are multiple problems:

  • MpscUnboundedArrayQueue.drain will limit the number of entries to 4096 at a time
  • MpscUnboundedArrayQueue.drain uses relaxedPoll which is eventually consistent
  • There might be batches in flight, which need to be terminated

Modifications

  • replace drain with a while loop in close
  • replace drain with a for loop in batch flushing
  • add isClosed checks

Additional Context

CompletableFutures in the heap dump:

image

The instances are related to org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8:
image

org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8 seems to be this code block:

markPartitionedTopicDeletedAsync(topic).whenCompleteAsync((markResult, markExc) -> {
final boolean mdFound;
if (markExc != null) {
if (markExc.getCause() instanceof MetadataStoreException.NotFoundException) {
mdFound = false;
} else {
log.error("Failed to mark the topic {} as deleted", topic, markExc);
future.completeExceptionally(markExc);
return;
}
} else {
mdFound = true;
}
supplier.get().whenComplete((deleteResult, deleteExc) -> {
if (deleteExc != null && mdFound) {
unmarkPartitionedTopicDeletedAsync(topic)
.thenRun(() -> future.completeExceptionally(deleteExc))
.exceptionally(ex -> {
log.warn("Failed to unmark the topic {} as deleted", topic, ex);
future.completeExceptionally(deleteExc);
return null;
});
} else if (deleteExc != null) {
future.completeExceptionally(deleteExc);
} else {
future.complete(deleteResult);
}
});
});

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 3.3.0 milestone Apr 25, 2024
@lhotari lhotari self-assigned this Apr 25, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 25, 2024
@lhotari
Copy link
Member Author

lhotari commented Apr 25, 2024

It seems that the OOME is another issue. #22586

@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@Technoboy- Technoboy- closed this May 13, 2024
@Technoboy- Technoboy- reopened this May 13, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 75.00000% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 71.02%. Comparing base (bbc6224) to head (edb72fd).
Report is 260 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22585      +/-   ##
============================================
- Coverage     73.57%   71.02%   -2.55%     
+ Complexity    32624     5630   -26994     
============================================
  Files          1877     1891      +14     
  Lines        139502   152144   +12642     
  Branches      15299    18133    +2834     
============================================
+ Hits         102638   108061    +5423     
- Misses        28908    35445    +6537     
- Partials       7956     8638     +682     
Flag Coverage Δ
inttests 26.99% <23.52%> (+2.40%) ⬆️
systests 24.62% <62.50%> (+0.29%) ⬆️
unittests 72.20% <100.00%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...he/pulsar/metadata/impl/AbstractMetadataStore.java 85.29% <ø> (+0.73%) ⬆️
...ta/impl/batching/AbstractBatchedMetadataStore.java 84.67% <75.00%> (-11.38%) ⬇️

... and 363 files with indirect coverage changes

@merlimat merlimat merged commit f7d35e5 into apache:master Jun 14, 2024
48 of 51 checks passed
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants