Skip to content

[Backport 3.0] Enabled Async Shard Batch Fetch by default (#18139) #18162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gargmanik13
Copy link
Contributor

(cherry picked from commit 0cbd848)

Description

Enabled Async Shard Batch Fetch by default (#18139)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@gargmanik13 gargmanik13 force-pushed the backport/backport-18139-to-3.0 branch from 4dc5587 to 9b63c3b Compare April 30, 2025 06:44
Copy link
Contributor

❌ Gradle check result for 9b63c3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gargmanik13
Copy link
Contributor Author

❌ Gradle check result for 9b63c3b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Test Result (1 failure / -3)
org.opensearch.common.ssl.PemTrustConfigTests.testTrustConfigReloadsFileContents

Flaky test - #17745

Copy link
Contributor

❌ Gradle check result for f8379a7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gargmanik13
Copy link
Contributor Author

❌ Gradle check result for f8379a7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Test Result (1 failure / ±0)
org.opensearch.search.aggregations.startree.MetricAggregatorTests.testStarTreeDocValues

Flaky test - #18110

@gargmanik13 gargmanik13 force-pushed the backport/backport-18139-to-3.0 branch from f8379a7 to 00ad97b Compare April 30, 2025 09:28
Copy link
Contributor

✅ Gradle check result for 00ad97b: SUCCESS

Copy link

codecov bot commented Apr 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.51%. Comparing base (dc4efa8) to head (00ad97b).
Report is 2 commits behind head on 3.0.

Additional details and impacted files
@@             Coverage Diff              @@
##                3.0   #18162      +/-   ##
============================================
- Coverage     72.53%   72.51%   -0.02%     
- Complexity    67192    67220      +28     
============================================
  Files          5480     5480              
  Lines        310321   310322       +1     
  Branches      45140    45140              
============================================
- Hits         225094   225035      -59     
- Misses        66800    66930     +130     
+ Partials      18427    18357      -70     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

## [Unreleased 3.0.x]
### Added
- Enabled Async Shard Batch Fetch by default ([#18139](https://github.com/opensearch-project/OpenSearch/pull/18139))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to go in the release notes if we merge this into the 3.0 release. Also the changelog entry on main should be removed if this ends up being released in 3.0.

@andrross
Copy link
Member

@gargmanik13 @shwetathareja Merging this into the 3.0 release this late increases the risk of introducing new problems. Can you provide a justification for why this is needed and why it can't go in the 3.1 release?

@rajiv-kv
Copy link
Contributor

@gargmanik13 @shwetathareja Merging this into the 3.0 release this late increases the risk of introducing new problems. Can you provide a justification for why this is needed and why it can't go in the 3.1 release?

@andrross - This is a breaking change as the default implementation of shard allocator is switched to batch allocator. Hence we wanted to align with major version release.

@andrross
Copy link
Member

andrross commented Apr 30, 2025

@gargmanik13 @shwetathareja Merging this into the 3.0 release this late increases the risk of introducing new problems. Can you provide a justification for why this is needed and why it can't go in the 3.1 release?

@andrross - This is a breaking change as the default implementation of shard allocator is switched to batch allocator. Hence we wanted to align with major version release.

@rajiv-kv What components and/or user experience does it break and how? Can we expect there to be integration problems in the next round of release candidate testing if we include this?

@getsaurabh02
Copy link
Member

getsaurabh02 commented Apr 30, 2025

@rajiv-kv What is the potential impact if this change? Also, can it wait until 3.1 - since it would be considered a breaking change if it creates change of APIs or experience for the users ?
Since RC3 is ready for GA and not only 2 days away from Go/No-Go call, not sure why this change needs to go this late.

@rajiv-kv
Copy link
Contributor

@gargmanik13 @shwetathareja Merging this into the 3.0 release this late increases the risk of introducing new problems. Can you provide a justification for why this is needed and why it can't go in the 3.1 release?

@andrross - This is a breaking change as the default implementation of shard allocator is switched to batch allocator. Hence we wanted to align with major version release.

@rajiv-kv What components and/or user experience does it break and how? Can we expect there to be integration problems in the next round of release candidate testing if we include this?

The change is specific to Shard Management component of Cluster Manager which deals with unassigned shards. The user facing setting is cluster.allocator.existing_shards_allocator.batch_enabled which is disabled by default. When the setting is enabled, the ExistingShardsAllocator implementation will be switched from GatewayAllocator to ShardsBatchGatewayAllocator. There is a significant difference b/w implementations on the internal cache and transport calls to collect the existing shard information from other nodes in cluster.

As part of this PR, we have rerun all the existing integration tests with batch mode enabled. Hence I do not expect any major integration problems.

@andrross
Copy link
Member

There is a significant difference b/w implementations on the internal cache and transport calls to collect the existing shard information from other nodes in cluster.

@rajiv-kv I understand the implementation is significantly different, but I believe we should be free to make changes to the shard allocation behavior in a minor version if it doesn't impact plugin or user-facing APIs and does not result in significantly different shard placement decisions.

@andrross
Copy link
Member

andrross commented May 1, 2025

Closing as we've decided to hold off on this change for the next 3.1 release.

@andrross andrross closed this May 1, 2025
@gargmanik13 gargmanik13 deleted the backport/backport-18139-to-3.0 branch May 5, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants