Skip to content

Implement parallel shard refresh behind cluster settings #17782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 20, 2025

Conversation

ashking94
Copy link
Member

@ashking94 ashking94 commented Apr 3, 2025

Description

This PR implements parallel shard-level refresh capability that can be enabled/disabled via cluster settings. Key changes include:

  • Introduces new cluster setting cluster.index.refresh.shard_level.enabled (default: false) to control refresh behavior
  • When enabled, refresh tasks run at individual shard level instead of index level
  • Maintains backward compatibility by defaulting to index-level refresh when setting is disabled
  • Supports dynamic updates to refresh intervals and refresh modes

Implementation Details

  • Added new AsyncShardRefreshTask class to handle shard-level refresh operations
  • Modified IndexService and IndexShard to manage refresh task lifecycle
  • Added synchronization mechanisms to safely handle transitions between index and shard level refresh modes
  • Implemented proper cleanup of refresh tasks during mode transitions
  • Added comprehensive test coverage for the new functionality

Benefits

  • Enables more granular refresh control at shard level
  • Allows for better resource utilization through parallel refresh operations
  • Maintains existing refresh behavior by default for backward compatibility

The feature is marked as experimental with @ExperimentalApi annotation.

Related Issues

Meta issue: #17776

Check List

  • Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Apr 3, 2025

❕ Gradle check result for 74c406f: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Apr 3, 2025

Codecov Report

Attention: Patch coverage is 81.05263% with 18 lines in your changes missing coverage. Please review.

Project coverage is 72.54%. Comparing base (cbaddd3) to head (11e7831).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
...in/java/org/opensearch/index/shard/IndexShard.java 68.96% 2 Missing and 7 partials ⚠️
...c/main/java/org/opensearch/index/IndexService.java 84.31% 1 Missing and 7 partials ⚠️
...in/java/org/opensearch/indices/IndicesService.java 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17782      +/-   ##
============================================
+ Coverage     72.51%   72.54%   +0.02%     
- Complexity    67108    67128      +20     
============================================
  Files          5475     5478       +3     
  Lines        309916   310125     +209     
  Branches      45060    45080      +20     
============================================
+ Hits         224725   224966     +241     
+ Misses        66895    66820      -75     
- Partials      18296    18339      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

❌ Gradle check result for eb1a065: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4349e25: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ashking94 ashking94 force-pushed the shard-level-refresh branch from 4349e25 to c428af3 Compare April 16, 2025 16:42
Copy link
Contributor

❌ Gradle check result for c428af3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ashking94 ashking94 force-pushed the shard-level-refresh branch from c428af3 to 11e7831 Compare April 19, 2025 09:00
Copy link
Contributor

❌ Gradle check result for 11e7831: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 11e7831: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 11e7831: SUCCESS

@sachinpkale sachinpkale merged commit 96481cc into opensearch-project:main Apr 20, 2025
31 checks passed
Comment on lines +5568 to +5569
refreshTask.close();
refreshTask = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the close should de-reference the refreshTask

x-INFiN1TY-x pushed a commit to x-INFiN1TY-x/OpenSearch_Local that referenced this pull request Apr 24, 2025
…project#17782)

* Implement parallel shard refresh behind cluster settings

Signed-off-by: Ashish Singh <[email protected]>

* Incorporate PR review comments

Signed-off-by: Ashish Singh <[email protected]>

* Incorporate PR review comments

Signed-off-by: Ashish Singh <[email protected]>

* Fix compilation failure

Signed-off-by: Ashish Singh <[email protected]>

---------

Signed-off-by: Ashish Singh <[email protected]>
Signed-off-by: Tanishq Ranjan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants