Skip to content

Change IOContext from READONCE to DEFAULT to avoid WrongThreadException #17502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 7, 2025

Conversation

sachinpkale
Copy link
Member

Description

  • As part of Update Apache Lucene to 9.12.0 #15333, read context in RemoteStoreRefreshListener is changed from DEFAULT to READONCE.
  • But this fails with java.lang.WrongThreadException: Attempted access outside owning thread.
Caused by: java.lang.WrongThreadException: Attempted access outside owning thread
                at java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:315) ~[?:?]
                at java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113) ~[?:?]
                at java.base/jdk.internal.foreign.MemorySessionImpl.checkValidState(MemorySessionImpl.java:219) ~[?:?]
                at java.base/jdk.internal.foreign.ConfinedSession.justClose(ConfinedSession.java:83) ~[?:?]
                at java.base/jdk.internal.foreign.MemorySessionImpl.close(MemorySessionImpl.java:242) ~[?:?]
                at java.base/jdk.internal.foreign.MemorySessionImpl$1.close(MemorySessionImpl.java:88) ~[?:?]
                at org.apache.lucene.store.MemorySegmentIndexInput.close(MemorySegmentIndexInput.java:514) ~[lucene-core-9.12.1.jar:9.12.1 7a97a05a239d6fb9f1f347aa09bfa52e875be092 - 2024-12-09 16:47:48]
                at org.opensearch.common.blobstore.transfer.stream.OffsetRangeIndexInputStream$OffsetRangeRefCount.lambda$new$0(OffsetRangeIndexInputStream.java:157) ~[opensearch-2.19.1.jar:2.19.1]
                at org.opensearch.common.concurrent.RefCountedReleasable.closeInternal(RefCountedReleasable.java:49) ~[opensearch-2.19.1.jar:2.19.1]
                at org.opensearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:78) ~[opensearch-common-2.19.1.jar:2.19.1]
                at org.opensearch.common.util.concurrent.RunOnce.run(RunOnce.java:55) ~[opensearch-2.19.1.jar:2.19.1]
                at org.opensearch.common.blobstore.transfer.stream.OffsetRangeIndexInputStream.close(OffsetRangeIndexInputStream.java:167) ~[opensearch-2.19.1.jar:2.19.1]
                at org.opensearch.common.blobstore.transfer.stream.RateLimitingOffsetRangeInputStream.close(RateLimitingOffsetRangeInputStream.java:86) ~[opensearch-2.19.1.jar:2.19.1]
                at java.base/java.io.BufferedInputStream.close(BufferedInputStream.java:618) ~[?:?]
                at org.opensearch.repositories.s3.async.AsyncPartsHandler$UploadTrackedBufferedInputStream.close(AsyncPartsHandler.java:204) ~[repository-s3-2.19.1.jar:2.19.1]
                at org.opensearch.repositories.s3.async.AsyncTransferManager.releaseResourcesSafely(AsyncTransferManager.java:446) ~[repository-s3-2.19.1.jar:2.19.1]
                at org.opensearch.repositories.s3.async.AsyncTransferManager.lambda$uploadInOneChunk$19(AsyncTransferManager.java:403) [repository-s3-2.19.1.jar:2.19.1]
                at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?]
                ... 49 more
  • This is due to the async flow of repository-s3 where IndexInput is getting close in the following code flow.
    public OffsetRangeRefCount(ClosingStreams ref) {
    super("OffsetRangeRefCount", ref, () -> {
    try {
    ref.inputStreamIndexInput.close();
    } catch (IOException ex) {
    logger.error("Failed to close indexStreamIndexInput", ex);
    }
    try {
    ref.indexInput.close();
    } catch (IOException ex) {
    logger.error("Failed to close indexInput", ex);
    }
    });
    }
  • Remote store integ tests use FsRepository but as the implementation changes for repository-s3, ideally we need to integ tests that talk directly to s3. We will take this up as a follow-up task.

Related Issues

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@sachinpkale
Copy link
Member Author

Created an issue to run remote store integ tests with different repository plugin implementations: #17503

Copy link
Contributor

github-actions bot commented Mar 4, 2025

❌ Gradle check result for 23319dd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sachinpkale
Copy link
Member Author

Test failing with:

1> java.lang.RuntimeException: MockDirectoryWrapper: opening segments file [segments_3n] with a non-READONCE context[IOContext[context=DEFAULT, mergeInfo=null, flushInfo=null, readAdvice=RANDOM]]

Taking a look

Copy link
Contributor

github-actions bot commented Mar 4, 2025

❌ Gradle check result for 23319dd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@sachinpkale
Copy link
Member Author

The following code in MockDirectoryWrapper is causing the test failures:

https://github.com/apache/lucene/blob/576c449cce64fa497cbc9000bb2c7600b3b7efe0/lucene/test-framework/src/java/org/apache/lucene/tests/store/MockDirectoryWrapper.java#L820-L827

Fixed the test failures as part of latest commit

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❕ Gradle check result for 553ef13: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assertion on here on the IOContext to ensure all uploads have IOContext correctly set

Signed-off-by: Sachin Kale <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for e762397: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sachin Kale <[email protected]>
Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for c8904cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❌ Gradle check result for c8904cd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 6, 2025

❕ Gradle check result for c8904cd: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 72.57%. Comparing base (f6d6aa6) to head (c8904cd).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
...va/org/opensearch/index/store/RemoteDirectory.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17502      +/-   ##
============================================
+ Coverage     72.47%   72.57%   +0.09%     
- Complexity    65705    65741      +36     
============================================
  Files          5307     5307              
  Lines        304774   304775       +1     
  Branches      44193    44193              
============================================
+ Hits         220898   221197     +299     
+ Misses        65738    65463     -275     
+ Partials      18138    18115      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sachinpkale sachinpkale merged commit 588f46d into opensearch-project:main Mar 7, 2025
30 of 31 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-17502-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 588f46d731587bebd54511dc2df21a2f4ffb9f32
# Push it to GitHub
git push --set-upstream origin backport/backport-17502-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-17502-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.19 2.19
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.19
# Create a new branch
git switch --create backport/backport-17502-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 588f46d731587bebd54511dc2df21a2f4ffb9f32
# Push it to GitHub
git push --set-upstream origin backport/backport-17502-to-2.19
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport/backport-17502-to-2.19.

sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request Mar 7, 2025
sachinpkale added a commit to sachinpkale/OpenSearch that referenced this pull request Mar 7, 2025
sachinpkale added a commit that referenced this pull request Mar 7, 2025
sachinpkale added a commit that referenced this pull request Mar 10, 2025
vinaykpud pushed a commit to vinaykpud/OpenSearch that referenced this pull request Mar 18, 2025
…on (opensearch-project#17502)

---------

Signed-off-by: Sachin Kale <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants