-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Completion FSTs to be loaded off-heap at all times #14364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completion FSTs to be loaded off-heap at all times #14364
Conversation
All the existing completion postings format load their FSTs on-heap. It is possible to customize that behaviour by mainintaing a custom postings format that override the fst load mode. TestSuggestField attempted to test all modes, but in practice, it can only test the default load mode because the read path relies on the default constructor called via SPI, that does not allow providing the fst load mode. This commit changes the default value to OFF_HEAP, and addresses the testing gap in TestSuggestField, so that all load modes are tested. This unveiled a bug when loading FST off heap and the index input is a MemorySegmentIndexInput which is fixed by apache#14271.
* Creates a {@link Completion101PostingsFormat} that will use the provided <code>fstLoadMode | ||
* </code> to determine if the completion FST should be loaded on or off heap. | ||
*/ | ||
public Completion101PostingsFormat(FSTLoadMode fstLoadMode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that these constructors taking the fst load mode are trappy, in that they make readers think that calling them allows to override the load mode. In practice, the load mode is only ever used at read time, and that codepath goes through SPI loading, so via the default constructor. I think removing these is the best path forward to avoid confusion.
@@ -122,11 +122,6 @@ public enum FSTLoadMode { | |||
|
|||
private final FSTLoadMode fstLoadMode; | |||
|
|||
/** Used only by core Lucene at read-time via Service Provider instantiation */ | |||
public CompletionPostingsFormat(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unused and redundant.
protected PostingsFormat delegatePostingsFormat() { | ||
return delegate.delegatePostingsFormat(); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this may work accidentally on NRT segments if IndexWriter opens them with the codec configured on the IndexWriterConfig, but it you were to open an index from a Directory using this postings format, it would open it using the default FST load mode and ignore the one passed to the constructor here. This is because, Lucene would read "Completion101" from the segment infos and look it up through SPI, which would return the default format that is registered under this name rather than this instance that you are creating here.
I wonder if we should remove FSTLoadMode completely. To your point, it's trappy to think that this thing is actually configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I tried to quickly hack this together and I ended up making the same mistake we already had in existing tests. Shame on me! I really do need to register the custom postings format to SPI for this to leverage the random fst load mode. I was also thinking about the option to remove configurability of fst load mode. That would certainly simplify things and avoid this same mistake in the future. Either that, or have a proper example of how the postings format load mode can be customized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this PR to remove the option to configure the load mode entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, don't miss adding a CHANGES entry. :) We may want to add back-compat for NRTSuggester#load(IndexInput, FstLoadMode)
upon backporting?
Thanks @jpountz what's your suggestion around back-compat? Sounds like you are suggesting not backporting the removal of the fst load mode enum but only the switch to off-heap by default, is that accurate? |
I was thinking of keeping the |
Cool then I will target this PR at main only, and open a separate PR for |
Good question. The class is public and looked like a user-facing API hence my comment, but you can't serialize a NRTSuggester yourself since |
I opened #14372 to address the visibility issue of |
I merged main in after merging #14372 and added a changelog entry. I believe this is ready to go, and can now be backported to branch_10x as-is. |
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
Agreed! |
Thanks @jpountz for all the help! |
All the existing completion postings format load their FSTs on-heap. It is possible to customize that behaviour by mainintaing a custom postings format that override the fst load mode. TestSuggestField attempted to test all modes, but in practice, it can only test the default load mode because the read path relies on the default constructor called via SPI, that does not allow providing the fst load mode. This commit switches loading of FST to off-heap, and removes the fst load mode argument from all completion postings format classes, effectively making the loading always off-heap.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See #111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
All the existing completion postings format load their FSTs on-heap. It is possible to customize that behaviour by mainintaing a custom postings format that override the fst load mode. TestSuggestField attempted to test all modes, but in practice, it can only test the default load mode because the read path relies on the default constructor called via SPI, that does not allow providing the fst load mode. This commit switches loading of FST to off-heap, and removes the fst load mode argument from all completion postings format classes, effectively making the loading always off-heap.
All the existing completion postings format load their FSTs on-heap. It is possible to customize that behaviour by mainintaing a custom postings format that override the fst load mode. TestSuggestField attempted to test all modes, but in practice, it can only test the default load mode because the read path relies on the default constructor called via SPI, that does not allow providing the fst load mode. This commit switches loading of FST to off-heap, and removes the fst load mode argument from all completion postings format classes, effectively making the loading always off-heap.
A while ago we introduced a completion postings format extension to eventually be able to customize how completion FSTs are loaded. See elastic#111494. We have never leveraged this extension, and meanwhile Lucene is moving to always load FSTs off-heap, and no longer allow on-heap. See apache/lucene#14364 . This commit removes the SPI extension as it is no longer needed.
All the existing completion postings format load their FSTs on-heap. It is possible to customize that behaviour by mainintaing a custom postings format that override the fst load mode.
TestSuggestField attempted to test all modes, but in practice, it can only test the default load mode because the read path relies on the default constructor called via SPI, that does not allow providing the fst load mode.
This commit switches loading of FST to off-heap, and removes the fst load mode argument from all completion postings format classes, effectively making the loading always off-heap.
See related discussion where this change originated at #14275 .