Skip to content

[BUG] Unable to upload segments to remote GCS store #18015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
varunbharadwaj opened this issue Apr 21, 2025 · 11 comments
Closed

[BUG] Unable to upload segments to remote GCS store #18015

varunbharadwaj opened this issue Apr 21, 2025 · 11 comments
Labels
bug Something isn't working Indexing:Replication Issues and PRs related to core replication framework eg segrep untriaged

Comments

@varunbharadwaj
Copy link
Contributor

varunbharadwaj commented Apr 21, 2025

Describe the bug

When trying to test pull-based ingestion with remote segrep using GCP, segment upload is failing with following error.

Exception while uploading new segments to the remote segment store"}
java.io.IOException: com.google.cloud.storage.StorageException: java.lang.SecurityException: Denied access to: storage.googleapis.com:443, domain ProtectionDomain  (file:/usr/share/opensearch-all/opensearch-2.x/lib/lucene-core-10.1.0.jar <no signer certificates>

Full trace

Caused by: com.google.cloud.storage.StorageException: java.lang.SecurityException: Denied access to: storage.googleapis.com:443, domain ProtectionDomain  (file:/usr/share/opensearch-all/opensearch-2.x/lib/lucene-core-10.1.0.jar <no signer certificates>)
 jdk.internal.loader.ClassLoaders$AppClassLoader@76ed5528
 <no principals>
 java.security.Permissions@546a51ea (
 ("java.util.PropertyPermission" "java.specification.version" "read")
 ("java.util.PropertyPermission" "java.vm.vendor" "read")
 ("java.util.PropertyPermission" "path.separator" "read")
 ("java.util.PropertyPermission" "os.version" "read")
 ("java.util.PropertyPermission" "java.vendor.url" "read")
 ("java.util.PropertyPermission" "java.vm.name" "read")
 ("java.util.PropertyPermission" "java.vm.specification.version" "read")
 ("java.util.PropertyPermission" "os.name" "read")
 ("java.util.PropertyPermission" "java.version" "read")
 ("java.util.PropertyPermission" "os.arch" "read")
 ("java.util.PropertyPermission" "java.specification.vendor" "read")
 ("java.util.PropertyPermission" "java.vm.specification.name" "read")
 ("java.util.PropertyPermission" "file.separator" "read")
 ("java.util.PropertyPermission" "line.separator" "read")
 ("java.util.PropertyPermission" "java.vm.specification.vendor" "read")
 ("java.util.PropertyPermission" "java.specification.name" "read")
 ("java.util.PropertyPermission" "java.vendor" "read")
 ("java.util.PropertyPermission" "java.vm.version" "read")
 ("java.util.PropertyPermission" "java.specification.maintenance.version" "read")
 ("java.util.PropertyPermission" "java.class.version" "read")
 ("java.lang.RuntimePermission" "accessClassInPackage.com.sun.beans.*")
 ("java.lang.RuntimePermission" "accessClassInPackage.com.apple.*")
 ("java.lang.RuntimePermission" "accessClassInPackage.com.sun.java.swing.plaf.*")
 ("java.lang.RuntimePermission" "exitVM")
 ("java.lang.RuntimePermission" "accessClassInPackage.com.sun.beans")
 ("java.io.FilePermission" "/usr/share/opensearch-all/opensearch-2.x/lib/lucene-core-10.1.0.jar" "read")
 ("java.net.SocketPermission" "localhost:0" "listen,resolve")
)


	at com.google.cloud.storage.StorageException.translateAndThrow(StorageException.java:81) ~[?:?]
	at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:477) ~[?:?]
	at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:408) ~[?:?]
	at org.opensearch.repositories.gcs.GoogleCloudStorageBlobStore.lambda$listBlobsByPrefix$1(GoogleCloudStorageBlobStore.java:169) ~[?:?]
	at org.opensearch.repositories.gcs.SocketAccess.lambda$doPrivilegedVoidIOException$0(SocketAccess.java:69) ~[?:?]
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:571) ~[?:?]
	at org.opensearch.repositories.gcs.SocketAccess.doPrivilegedVoidIOException(SocketAccess.java:68) ~[?:?]
	at org.opensearch.repositories.gcs.GoogleCloudStorageBlobStore.listBlobsByPrefix(GoogleCloudStorageBlobStore.java:168) ~[?:?]
	at org.opensearch.repositories.gcs.GoogleCloudStorageBlobContainer.listBlobsByPrefix(GoogleCloudStorageBlobContainer.java:80) ~[?:?]
	at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:323) ~[opensearch-3.0.0.jar:3.0.0]
	at org.opensearch.common.blobstore.BlobContainer.listBlobsByPrefixInSortedOrder(BlobContainer.java:312) ~[opensearch-3.0.0.jar:3.0.0]
	at org.opensearch.index.store.RemoteDirectory.listFilesByPrefixInLexicographicOrder(RemoteDirectory.java:133) ~[opensearch-3.0.0.jar:3.0.0]
	... 24 more

This issue is found on a real deployment. This most likely might be related to moving away from security manager, as it was working fine previously.

Other than this, indexing on a single shard works fine and don't see any other issues at this time with the setup.

Related component

Indexing:Replication

To Reproduce

Enable segment replication using GCP. Segment upload should fail.
Latest OS main branch (3.0) was used for this test.

Expected behavior

There should be no failures.

Additional Details

Plugins
ingestion-kafka is specifically enabled here for this test for pull-based ingestion, but looks like a generic issue.

@varunbharadwaj varunbharadwaj added bug Something isn't working untriaged labels Apr 21, 2025
@github-actions github-actions bot added the Indexing:Replication Issues and PRs related to core replication framework eg segrep label Apr 21, 2025
@andrross
Copy link
Member

@varunbharadwaj Can you provide a full stack trace? Also, can you test the normal indexing path on this same cluster? Something simple line the following will be fine:

curl -XPOST localhost:9200/target-index/_doc -H 'Content-type: application/json' -d '{
  "name": "Example",
  "price": 29.99,
  "description": "To be or not to be, that is the question"
}'

@cwperks
Copy link
Member

cwperks commented Apr 21, 2025

@varunbharadwaj Can you post the full stack trace to see all unique protection domains from the stack trace?

@varunbharadwaj
Copy link
Contributor Author

I've updated the description with the available stack trace.

@varunbharadwaj
Copy link
Contributor Author

Indexing in pull/push based is fine on the primary node.

Also, still figuring out, but both push and pull based face this issue. At the same time, I see some successful uploads along with failures. Maybe triggering in certain scenarios. Will post more information if i find anything.

@cwperks
Copy link
Member

cwperks commented Apr 21, 2025

CC: @reta

The error is occurring because of a difference between how JSM and the Java Agent extract the protection domains from a call. See #17894 for more details.

@andrross
Copy link
Member

@cwperks I was able to reproduce this with a segrep replica using S3. Here's a parsed version of the suppressed exceptions:

Caused by: java.lang.SecurityException: Denied access to: andrross-opensearch-snapshots-us-west-2.s3.amazonaws.com:443, domain ProtectionDomain  (file:/home/ubuntu/search-installs/opensearch-3.0.0-beta1/lib/lucene-core-10.1.0.jar <no signer certificates>)
        Suppressed: java.lang.SecurityException: Denied access to: andrross-opensearch-snapshots-us-west-2.s3.amazonaws.com:443, domain ProtectionDomain  (file:/home/ubuntu/search-installs/opensearch-3.0.0-beta1/lib/lucene-core-10.1.0.jar <no signer certificates>)
        Suppressed: java.lang.SecurityException: Denied access to: andrross-opensearch-snapshots-us-west-2.s3.amazonaws.com:443, domain ProtectionDomain  (file:/home/ubuntu/search-installs/opensearch-3.0.0-beta1/lib/lucene-core-10.1.0.jar <no signer certificates>)
        Suppressed: java.lang.SecurityException: Denied access to: andrross-opensearch-snapshots-us-west-2.s3.amazonaws.com:443, domain ProtectionDomain  (file:/home/ubuntu/search-installs/opensearch-3.0.0-beta1/lib/lucene-core-10.1.0.jar <no signer certificates>)
        Suppressed: java.lang.SecurityException: Denied access to: andrross-opensearch-snapshots-us-west-2.s3.amazonaws.com:443, domain ProtectionDomain  (file:/home/ubuntu/search-installs/opensearch-3.0.0-beta1/lib/lucene-core-10.1.0.jar <no signer certificates>)
        Suppressed: java.lang.SecurityException: Denied access to: andrross-opensearch-snapshots-us-west-2.s3.amazonaws.com:443, domain ProtectionDomain  (file:/home/ubuntu/search-installs/opensearch-3.0.0-beta1/lib/lucene-core-10.1.0.jar <no signer certificates>)

I think the relevant code path here is:

	at org.opensearch.repositories.s3.S3RetryingInputStream.lambda$openStream$1(S3RetryingInputStream.java:120) ~[?:?]
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.doPrivileged(SocketAccess.java:56) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.openStream(S3RetryingInputStream.java:119) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.<init>(S3RetryingInputStream.java:101) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.<init>(S3RetryingInputStream.java:84) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.readBlob(S3BlobContainer.java:151) ~[?:?]
	at org.opensearch.index.store.RemoteDirectory.openInput(RemoteDirectory.java:238) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.opensearch.index.store.RemoteSegmentStoreDirectory.openInput(RemoteSegmentStoreDirectory.java:507) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.opensearch.index.shard.StoreRecovery$StatsDirectoryWrapper$1.openInput(StoreRecovery.java:294) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.apache.lucene.store.Directory.copyFrom(Directory.java:180) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
	at org.opensearch.index.store.Store$StoreDirectory.copyFrom(Store.java:972) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.opensearch.index.shard.StoreRecovery$StatsDirectoryWrapper.copyFrom(StoreRecovery.java:287) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.opensearch.index.store.RemoteStoreFileDownloader.lambda$copyOneFile$2(RemoteStoreFileDownloader.java:151) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.opensearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:126) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at org.opensearch.index.store.RemoteStoreFileDownloader.lambda$copyOneFile$3(RemoteStoreFileDownloader.java:150) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:916) ~[opensearch-3.0.0-beta1.jar:3.0.0-beta1]

@andrross
Copy link
Member

I was able to confirm the change in #17894 does fix this issue in my setup.

@andrross
Copy link
Member

@varunbharadwaj Can you test this again with the latest fix from #17894?

@varunbharadwaj
Copy link
Contributor Author

@andrross sure, will test and update here

@varunbharadwaj
Copy link
Contributor Author

Ran another test with the latest main branch and works fine with the fix 👍

@varunbharadwaj
Copy link
Contributor Author

Thanks for quick fix, closing this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing:Replication Issues and PRs related to core replication framework eg segrep untriaged
Projects
None yet
Development

No branches or pull requests

3 participants