Skip to content

Stop closing index input when loading NRTSuggester #14271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

javanna
Copy link
Contributor

@javanna javanna commented Feb 21, 2025

Closing the index input is not necessary when loading the NRTSuggester. Also, it causes a bug as the clone gets closed right before the suggester instance gets returned, which makes it unusable in the later lookup method which slices the index input while it's already closed. This is addressed by creating a slice, that is not further closed, so that the index input is still usable once the suggester is returned.

This was not discovered so far due to a testing gap that is fixed by #14270.
The bug was introduced by #13524 , stacktrace follows:

Already closed: MemorySegmentIndexInput(path="/home/javanna/apache/lucene/lucene/suggest/build/tmp/tests-tmp/test539812848641464359/_2_Completion101_0.lkp")
org.apache.lucene.store.AlreadyClosedException: Already closed: MemorySegmentIndexInput(path="/home/javanna/apache/lucene/lucene/suggest/build/tmp/tests-tmp/test539812848641464359/_2_Completion101_0.lkp")
    at __randomizedtesting.SeedInfo.seed([88A0E29FD265DAD4:C82996819FD55267]:0)
    at [email protected]/org.apache.lucene.store.MemorySegmentIndexInput.alreadyClosed(MemorySegmentIndexInput.java:128)
    at [email protected]/org.apache.lucene.store.MemorySegmentIndexInput.ensureOpen(MemorySegmentIndexInput.java:103)
    at [email protected]/org.apache.lucene.store.MemorySegmentIndexInput.buildSlice(MemorySegmentIndexInput.java:644)
    at [email protected]/org.apache.lucene.store.MemorySegmentIndexInput.slice(MemorySegmentIndexInput.java:614)
    at [email protected]/org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.slice(MemorySegmentIndexInput.java:718)
    at [email protected]/org.apache.lucene.store.IndexInput.randomAccessSlice(IndexInput.java:163)
    at [email protected]/org.apache.lucene.util.fst.OffHeapFSTStore.getReverseBytesReader(OffHeapFSTStore.java:57)
    at [email protected]/org.apache.lucene.util.fst.FST.getBytesReader(FST.java:1149)
    at org.apache.lucene.search.suggest.analyzing.FSTUtil.intersectPrefixPaths(FSTUtil.java:65)
    at org.apache.lucene.search.suggest.document.NRTSuggester.lookup(NRTSuggester.java:133)
    at org.apache.lucene.search.suggest.document.CompletionScorer.score(CompletionScorer.java:73)
    at org.apache.lucene.search.suggest.document.SuggestIndexSearcher.suggest(SuggestIndexSearcher.java:75)
    at org.apache.lucene.search.suggest.document.SuggestIndexSearcher.suggest(SuggestIndexSearcher.java:53)

Cloning the index input is not necessary when loading the NRTSUggester. Also,
it causes a bug as the clone gets closed right before the suggester instance
gets returned, which makes it unusable in the later lookup method which slices
the index input while it's already closed.

This was not discovered so far due to a testing gap that is fixed by apache#14270
@javanna javanna added this to the 10.1.1 milestone Feb 21, 2025
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.

dictClone.seek(offset);
suggester = NRTSuggester.load(dictClone, fstLoadMode);
}
IndexInput indexInput = dictIn.slice("NRTSuggester", offset, dictIn.length() - offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got me confused wrt the PR title since slices are technically clones. Maybe update the title to say "stop closing the clone" instead of "stop cloning"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we no longer call clone, that is what I meant :) but I see your point that we call it indirectly now.

@javanna javanna changed the title Stop cloning index input when loading NRTSuggester Stop closing index input when loading NRTSuggester Feb 21, 2025
@javanna javanna merged commit 94ad014 into apache:main Feb 21, 2025
6 checks passed
@javanna
Copy link
Contributor Author

javanna commented Feb 21, 2025

Thanks all!

javanna added a commit that referenced this pull request Feb 21, 2025
javanna added a commit that referenced this pull request Feb 21, 2025
Cloning the index input is not necessary when loading the NRTSUggester. Also,
it causes a bug as the clone gets closed right before the suggester instance
gets returned, which makes it unusable in the later lookup method which slices
the index input while it's already closed.

This was not discovered so far due to a testing gap that is fixed by #14270
javanna added a commit that referenced this pull request Feb 21, 2025
Cloning the index input is not necessary when loading the NRTSUggester. Also,
it causes a bug as the clone gets closed right before the suggester instance
gets returned, which makes it unusable in the later lookup method which slices
the index input while it's already closed.

This was not discovered so far due to a testing gap that is fixed by #14270
javanna added a commit to javanna/lucene that referenced this pull request Mar 17, 2025
All the existing completion postings format load their FSTs on-heap.
It is possible to customize that behaviour by mainintaing a custom postings
format that override the fst load mode.

TestSuggestField attempted to test all modes, but in practice, it can only test
the default load mode because the read path relies on the default constructor
called via SPI, that does not allow providing the fst load mode.

This commit changes the default value to OFF_HEAP, and addresses the testing
gap in TestSuggestField, so that all load modes are tested. This unveiled a bug
when loading FST off heap and the index input is a MemorySegmentIndexInput
which is fixed by apache#14271.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants