Support modifying segmentInfos.counter in IndexWriter #14417

guojialiang92 · 2025-03-27T06:58:57Z

Description

This PR aims to address issue 14362. This issue includes a discussion of the benefits of this modification.

Tests

simple scenario
- I added test TestIndexWriter#testAdvanceSegmentInfosCounter, the writer index a bunch of docs, then advance its counter.
crash and recovery scenario
- I added test TestIndexWriter#testAdvanceSegmentCounterInCrashAndRecoveryScenario, the writer index a bunch of docs, then close it, start a new writer on the same index, and advance its counter.
Concurrent writing and modification scenario
- Modified the test ThreadedIndexingAndSearchingTestCase, in the concurrent write thread, increased the logic of randomly modifying the segment counter, and checked after all write threads ended.

Checklist

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.

Signed-off-by: guojialiang <[email protected]>

lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java

lucene/core/src/java/org/apache/lucene/index/IndexWriter.java

lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java

Signed-off-by: guojialiang <[email protected]>

guojialiang92 · 2025-03-27T09:18:12Z

Hi, @vigyasharma
Thanks for helping with the code review, I have made modifications according to the suggestions.

vigyasharma · 2025-03-30T06:58:28Z

I think we can add a couple more tests to make it robust.

Some tests around concurrency – index with multiple threads, then advance the counter in one of the threads, and validate behavior. You can look at ThreadedIndexingAndSearchingTestCase and its derived tests for motivation.
A test for the crash-recovery scenario, which I suppose it the primary use case. We could make the writer index a bunch of docs, then kill it, start a new writer on the same index, and advance its counter.

vigyasharma · 2025-03-30T07:08:05Z

Also, IIUC IndexWriter#advanceSegmentInfosVersion() was added to handle similar scenarios for NRT replication (Lucene's native segment replication implementation). I'm curious why we didn't run into the need to advance SegmentInfos#counter at that time. Do you remember, @mikemccand (I know it's been a while! (: )?

Signed-off-by: guojialiang <[email protected]>

guojialiang92 · 2025-03-31T12:31:09Z

Thanks, @vigyasharma

I think we can add a couple more tests to make it robust.

Some tests around concurrency – index with multiple threads, then advance the counter in one of the threads, and validate behavior. You can look at ThreadedIndexingAndSearchingTestCase and its derived tests for motivation.

A test for the crash-recovery scenario, which I suppose it the primary use case. We could make the writer index a bunch of docs, then kill it, start a new writer on the same index, and advance its counter.

I have added the following tests according to the suggestions:

For crash-recovery scenario, I have added TestIndexWriter#testAdvanceSegmentCounterInCrashAndRecoveryScenario.
For scenarios with multi-threaded concurrent writes, I have modified the test ThreadedIndexingAndSearchingTestCase, in the concurrent write thread, increased the logic of randomly modifying the segment counter, and checked after all write threads ended.

guojialiang92 · 2025-03-31T12:45:18Z

Thanks, @vigyasharma
I also looked at Lucene's native segment replication, just sharing my personal opinion.

Also, IIUC IndexWriter#advanceSegmentInfosVersion() was added to handle similar scenarios for NRT replication (Lucene's native segment replication implementation). I'm curious why we didn't run into the need to advance SegmentInfos#counter at that time. Do you remember, @mikemccand (I know it's been a while! (: )?

In the code comments of Lucene's native segment replication, the risk of file conflicts is also mentioned, but no additional processing is done. From a robustness perspective, perhaps control should also be carried out. The relevant code is as follows：
ReplicaNode#fileIsIdentical (Segment name was reused! This is rare but possible and otherwise devastating)

  private boolean fileIsIdentical(String fileName, FileMetaData srcMetaData) throws IOException {

    FileMetaData destMetaData = readLocalFileMetaData(fileName);
    if (destMetaData == null) {
      // Something went wrong in reading the file (it's corrupt, truncated, does not exist, etc.):
      return false;
    }

    if (Arrays.equals(destMetaData.header(), srcMetaData.header()) == false
        || Arrays.equals(destMetaData.footer(), srcMetaData.footer()) == false) {
      // Segment name was reused!  This is rare but possible and otherwise devastating:
      if (isVerboseFiles()) {
        message("file " + fileName + ": will copy [header/footer is different]");
      }
      return false;
    } else {
      return true;
    }
  }

guojialiang92 · 2025-04-08T03:30:54Z

Hi, @vigyasharma
Could you please help me take a look again? Do you have any other suggestions?

vigyasharma

Apologies for the delay, these changes look great, @guojialiang92. I love the new tests, and how you're testing multi-threaded scenarios.

I have some minor comments. Please also add a CHANGES.txt entry as well and I'll merge this.

vigyasharma · 2025-04-08T06:59:08Z

...t-framework/src/java/org/apache/lucene/tests/index/ThreadedIndexingAndSearchingTestCase.java

@@ -189,6 +190,19 @@ public void run() {
                    addedField = null;
                  }

+                  // Maybe advance segment counter
+                  if (random().nextBoolean()) {


Running this with 50% probability might slow down the tests too much with the synchronization this needs. Let's run it with a lower probability, something like 1 in 7 – if (random().nextInt(7) == 5) { ... }

Thanks, I have modified the test and CHANGES.txt.

lucene/core/src/java/org/apache/lucene/index/IndexWriter.java

…rt_advance_segment_counter

Signed-off-by: guojialiang <[email protected]>

guojialiang92 · 2025-04-09T02:40:18Z

Thanks, @vigyasharma
I have made the modifications as suggested, please reivew the code again.

IndexWriter support advance segmentInfos counter

055e2d5

Signed-off-by: guojialiang <[email protected]>

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Mar 27, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Mar 27, 2025

github-actions bot added the module:core/index label Mar 27, 2025

fix UT

86f97b2

Signed-off-by: guojialiang <[email protected]>

vigyasharma reviewed Mar 27, 2025

View reviewed changes

lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java Outdated Show resolved Hide resolved

vigyasharma reviewed Mar 27, 2025

View reviewed changes

lucene/core/src/java/org/apache/lucene/index/IndexWriter.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/index/IndexWriter.java Show resolved Hide resolved

vigyasharma reviewed Mar 27, 2025

View reviewed changes

lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java Outdated Show resolved Hide resolved

lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java Outdated Show resolved Hide resolved

Some CR-related modifications

5f4077f

Signed-off-by: guojialiang <[email protected]>

add test

cba28e3

Signed-off-by: guojialiang <[email protected]>

github-actions bot added the module:test-framework label Mar 31, 2025

add test

df7aaa7

Signed-off-by: guojialiang <[email protected]>

vigyasharma reviewed Apr 8, 2025

View reviewed changes

guojialiang92 added 2 commits April 8, 2025 16:24

Merge remote-tracking branch 'origin/main' into dev/indexwriter_suppo…

570b258

…rt_advance_segment_counter

Modify tests and CHANGES.txt

e6a5340

Signed-off-by: guojialiang <[email protected]>

vigyasharma approved these changes Apr 9, 2025

View reviewed changes

Move changes entry to 10.3

63a9c12

vigyasharma merged commit 88d3573 into apache:main Apr 9, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Apr 9, 2025

vigyasharma pushed a commit that referenced this pull request Apr 9, 2025

Support modifying segmentInfos.counter in IndexWriter (#14417)

e271147

jpountz pushed a commit to jpountz/lucene that referenced this pull request Apr 24, 2025

Support modifying segmentInfos.counter in IndexWriter (apache#14417)

f2a89c7

guojialiang92 mentioned this pull request May 22, 2025

[RFC][segment replication] Introduce a heuristic method to avoid long-term write blocking during primary shard relocation opensearch-project/OpenSearch#18355

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support modifying segmentInfos.counter in IndexWriter #14417

Support modifying segmentInfos.counter in IndexWriter #14417

Uh oh!

guojialiang92 commented Mar 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guojialiang92 commented Mar 27, 2025 •

edited

Loading

Uh oh!

vigyasharma commented Mar 30, 2025

Uh oh!

vigyasharma commented Mar 30, 2025

Uh oh!

guojialiang92 commented Mar 31, 2025

Uh oh!

guojialiang92 commented Mar 31, 2025

Uh oh!

guojialiang92 commented Apr 8, 2025

Uh oh!

vigyasharma left a comment

Uh oh!

vigyasharma Apr 8, 2025

Uh oh!

guojialiang92 Apr 8, 2025

Uh oh!

Uh oh!

guojialiang92 commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

Support modifying segmentInfos.counter in IndexWriter #14417

Support modifying segmentInfos.counter in IndexWriter #14417

Uh oh!

Conversation

guojialiang92 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guojialiang92 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vigyasharma commented Mar 30, 2025

Uh oh!

vigyasharma commented Mar 30, 2025

Uh oh!

guojialiang92 commented Mar 31, 2025

Uh oh!

guojialiang92 commented Mar 31, 2025

Uh oh!

guojialiang92 commented Apr 8, 2025

Uh oh!

vigyasharma left a comment

Choose a reason for hiding this comment

Uh oh!

vigyasharma Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

guojialiang92 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guojialiang92 commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

guojialiang92 commented Mar 27, 2025 •

edited

Loading

guojialiang92 commented Mar 27, 2025 •

edited

Loading