-
Notifications
You must be signed in to change notification settings - Fork 1.1k
LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches #12345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -763,6 +763,11 @@ public Query rewrite(Query original) throws IOException { | |||
for (Query rewrittenQuery = query.rewrite(this); | |||
rewrittenQuery != query; | |||
rewrittenQuery = query.rewrite(this)) { | |||
if (queryTimeout != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an improvement, however, it's only checking at the very end of rewrite? Which is not so different from checking at the start of iterating through the hits? I.e. in practice it seems like it won't make much of an improvement?
I wonder if we can somehow check the timeout during rewrite? Seems tricky though ...
The timeout implementation that wraps Directory
can achieve that by checking during I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to wrap the directory similarly to what ExitableDirectoryReader
does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mikemccand and @jpountz!
@jpountz I didn't get the idea behind wrapping Directory
here. The rewrite method produce a more efficient query for search, I wonder how it is related to Directory
class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Deepika0510 I think what @jpountz meant is we should somehow intercept the I/O Lucene is inevitably doing in a rewrite
implementation to check for timeout. Similar to how ExitableDirectoryReader
wraps the low level Lucene APIs and checks for timeout.
But I don't see how we could cleanly do this (wrap Directory
) in the context of rewrite
.
Maybe instead we just use ExitableDirectoryReader
, but make a new generalized version ExitableIndexReader
that takes any IndexReader
(not just the DirectoryReader
implementation)?
Then, if a timeout is set on IndexSearcher
, we use this wrapped timeout IndexReader
in rewrite somehow? We should maybe create that wrapped reader once on IndexSearcher.setTimeout
calls and reuse it across all concurrent rewrites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, sorry for missing your question. @mikemccand 's understanding is correct, something like ExitableDirectoryReader
. +1 to @mikemccand 's suggestion:
- Implement a package-private
ExitableIndexReader
, which is really the same asExitableDirectoryReader
but for anyIndexReader
. It should wrap terms and points, but not postings and doc values (which are already covered via the bulk scorer). - Update
IndexSearcher#getIndexReader
to return the wrapped index reader. This will automatically make sure expensive rewrites are intercepted sinceIndexSearcher#getIndexReader
is their only way to get access to the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I just noticed this. ExitableDirectoryReader.ExitableFilterAtomicReader
does indeed already have timeout checking for searching vectors. But it only has it for float[]
. I am guessing something was missed during the byte[]
rewrite and update. I am going to add support for timeout checking for byte[]
in that class.
Just in case you get weird merge conflicts while working through this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mikemccand and @jpountz for suggestion!
I was going through the code and came across the below scenario. We want to wrap terms and points, but the related methods are in LeafReader
class and we access these method generally using the getContext().leaves().get(0).reader().<method>
.
E.g.
IndexReaderContext topReaderContext = reader.getContext();
for (LeafReaderContext context : topReaderContext.leaves()) {
final Terms terms = context.reader().terms(query.field);
……
……
……
}
So, even if we wrap LeafReader
and update IndexSearcher#getIndexReader()
to return the wrapped reader, we would not be able to return wrapped class at such above point where we get the reader using ReaderContext
.
Maybe one way out would be to create a separate ReaderContext
such that it returns our wrapped IndexReader
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work automatically if you create an IndexReader wrapper. See how CompositeReader#getContext
is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz, @mikemccand
I have created a wrapper class of IndexReader
and tried to make minimal changes in other files. I added logic in CompositeReaderContext
to handle ExitableIndexReader
and needed to handle closing of inner wrapped IndexReader
to avoid thread leaks.
@Deepika0510 You don't only need to wrap the IndexReader, you also need to wrap all its leaves. |
@jpountz To wrap all the leaves, we would need to wrap ReaderContext classes along with LeafReader classes as well right? Since, we generally access the leaves through the ReaderContext class. Something like this:
|
I don't think you need to wrap Then a call lke |
However, to get that |
Hmm I'm confused: why would you need to get to the |
What I meant to ask is that after creating the E.g. when we created wrapped Similarly, we have to ensure the same for the |
Came across |
Hi @Deepika0510 -- what is the problem when callers access the leaves? Since you would subclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mikemccand @jpountz . I looked into SoftDeletesDirectoryReaderWrapper
and tried wrapping leafReader
the way we are wrapping SubReader
in doWrapDirectoryReader
. I was able to wrap leadReader
and we are able to enforce time out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix failing gradle check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking closer! Thanks @Deepika0510.
for (LeafReaderContext leafCtx : leaves) { | ||
LeafReader reader = leafCtx.reader(); | ||
readers.add(reader); | ||
// we try to reuse the life docs instances here if the reader cache key didn't change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
life
-> live
* TimeoutLeafReader is wrapper class for FilterLeafReader which is imposing timeout on different | ||
* operations of FilterLeafReader | ||
*/ | ||
public static class TimeoutLeafReader extends FilterLeafReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be public?
*/ | ||
protected TimeoutLeafReader(LeafReader in, QueryTimeout queryTimeout) { | ||
super(in); | ||
if (in == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could instead / more compactly do:
super(Objects.requireNonNull(in));
I think?
private static QueryTimeout countingQueryTimeout(int timeallowed) { | ||
|
||
return new QueryTimeout() { | ||
static int counter = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need = 0
-- it's already Java's default.
directory.close(); | ||
} | ||
|
||
private static QueryTimeout countingQueryTimeout(int timeallowed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to timeAllowed
?
@@ -46,9 +47,13 @@ public class StringDocValuesReaderState { | |||
* (e.g., to pickup NRT updates) requires constructing a new state instance. | |||
*/ | |||
public StringDocValuesReaderState(IndexReader reader, String field) throws IOException { | |||
this.reader = reader; | |||
if (reader instanceof ExitableIndexReader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this here? Caller should not be using a ExitableDirectoryReader
when constructing this state?
@@ -372,6 +373,9 @@ public static LeafSlice[] slices( | |||
|
|||
/** Return the {@link IndexReader} this searches. */ | |||
public IndexReader getIndexReader() { | |||
if (queryTimeout != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this seems dangerous -- getters shouldn't be doing magical / surprising things I think? Could we instead require/expect caller to creating the IndexSearcher
with the ExitableDirectoryReader
?
Edit: or perhaps when IndexSearcher
calls rewrite, specifically, if there is a queryTimeout
, it could wrap the reader at that point, only?
@@ -147,7 +149,10 @@ public void testRandomSampling() throws Exception { | |||
Math.min(5 * sampled.value.floatValue(), numDocs / 10.f), | |||
1.0); | |||
} | |||
|
|||
IOUtils.close(searcher.getIndexReader(), taxoReader, dir, taxoDir); | |||
IndexReader r = searcher.getIndexReader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't do this either? In general all code using IndexReader
should not need to specially check for ExitableDirectoryReader
. Rather, callers should not pass ExitableDirectoryReader
to these places.
if (queryTimeout.shouldExit()) { | ||
throw new ExitableIndexReader.TimeExceededException(); | ||
} | ||
return in.terms(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think you need to further wrap here? Likely the timeout will not have been hit when the rewrite
first starts. That rewrite
first calls .terms()
for each segment, and then does the hard work of getting the .iterator()
from it and stepping / seeking through the resulting TermsEnum
.
So I think you would need to return a ExitableTerms
, which in turn returns an ExitableTermsEnum
, etc.?
And perhaps same thing for points, doc values, though we could defer those for a followon change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mikemccand for reviewing the PR. I have addressed the comment. Please have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Deepika0510 -- this looks closer! I left a few comments.
} | ||
ExitableSubReaderWrapper exitableSubReaderWrapper = | ||
new ExitableSubReaderWrapper(readerCache, queryTimeout); | ||
exitableSubReaderWrapper.wrap(readers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm shouldn't we do something with the returned wrap'd readers?
// creating a new one | ||
return mapping.get(readerCacheHelper.getKey()); | ||
} | ||
return new TimeoutLeafReader(reader, queryTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we put this new wrap'd reader into the mapping
too?
public ExitableIndexReader(IndexReader indexReader, QueryTimeout queryTimeout) { | ||
this.indexReader = indexReader; | ||
this.queryTimeout = queryTimeout; | ||
doWrapIndexReader(indexReader, queryTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should return something and maybe you do this.indexReader = doWrapIndexReader(...);
or so?
* Throws {@link ExitableDirectoryReader.ExitingReaderException} if {@link | ||
* QueryTimeout#shouldExit()} returns true, or if {@link Thread#interrupted()} returns true. | ||
*/ | ||
private void checkTimeoutWithSampling() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why is this named WithSampling
? Is it supposed to try to check less often since this might be a hot spot? But it seems to just check every time it's called.
if (queryTimeout.shouldExit()) { | ||
throw new ExitableIndexReader.TimeExceededException(); | ||
} | ||
return in.getPointValues(field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we wrap points as well? Or we can open a follow-on issue. Timeout during points intersect may not be so important? Not sure ...
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mikemccand! I have made the required changes, please have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Deepika0510 -- thank you for persisting here!
I added a bunch of comments. Lucene's IndexReader
hierarchy is super tricky and I am not even certain about the comments I wrote. But the big picture is I think you can share some of the wrapper classes directly from ExitableDirectoryReader
, and then also fork small parts of that class into ExitableIndexReader
while switching DirectoryReader
to IndexReader
?
*/ | ||
public ExitableIndexReader(IndexReader indexReader, QueryTimeout queryTimeout) | ||
throws IOException { | ||
this.indexReader = new ExitableIndexReaderWrapper((DirectoryReader) indexReader, queryTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think we should not be casting incoming indexReader
to a DirectoryReader
? The idea of this new class is to enforce timeouts for any IndexReader
, not just DirectoryReader
(which is a subclass of IndexReader
), I think?
In fact, you might be able to start by copying ExitableDirectoryReader.java
to ExitableIndexReader.java
and then change all DirectoryReader
to IndexReader
and iterate from there? This way we would also get points
timeout implemented. I realize this is all quite tricky!!
|
||
/** Thrown when elapsed search time exceeds allowed search time. */ | ||
@SuppressWarnings("serial") | ||
static class TimeExceededException extends RuntimeException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have both ExitableDirectoryReader
and this class to use a single class (the existing ExitingReaderException
) when timeout happens?
|
||
@Override | ||
public StoredFields storedFields() throws IOException { | ||
if (queryTimeout.shouldExit()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we factor out a private
helper method that does this if
and the throw
? Just like ExitableDirectoryReader.checkAndThrow
.
|
||
@Override | ||
public int docFreq(Term term) throws IOException { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove this newline at the top of each of these methods?
return indexReader.getSumTotalTermFreq(field); | ||
} | ||
|
||
private static class ExitableIndexReaderWrapper extends FilterDirectoryReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be subclassing FilterDirectoryReader
here since we want to work with any IndexReader
.
* TimeoutLeafReader is wrapper class for FilterLeafReader which is imposing timeout on | ||
* different operations of FilterLeafReader | ||
*/ | ||
private static class TimeoutLeafReader extends FilterLeafReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could reuse the existing ExitableDirectoryReader.ExitableFilterAtomicReader
class, instead of making a new class here, maybe? And that class already implements timeouts for points, postings, doc values, etc.
|
||
private final CacheHelper readerCacheHelper; | ||
|
||
public ExitableIndexReaderWrapper(DirectoryReader in, QueryTimeout queryTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe reuse the existing ExitableDirectoryReader.ExitableSubReaderWrapper
instead of making a new one here?
@@ -77,7 +78,11 @@ public void testBasicSingleValued() throws Exception { | |||
new StringDocValuesReaderState(searcher.getIndexReader(), "field"); | |||
checkTopNFacetResult(expectedCounts, expectedTotalDocCount, searcher, state, 10, 2, 1, 0); | |||
|
|||
IOUtils.close(searcher.getIndexReader(), dir); | |||
IndexReader r = searcher.getIndexReader(); | |||
if (r instanceof ExitableIndexReader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why was this needed? In general users of ExitableIndexReader
should not have to special case which exact instance of IndexReader
they have. In this case, maybe ExitableIndexReader
is not implementing close
correctly?
@Override | ||
public IndexReaderContext getContext() { | ||
|
||
return indexReader.getContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't quite right -- we can't just return the underlying IndexReaderContext
. I think instead you need to get this underlying context, and then make a new CompositeReaderContext
that wraps each of the leaf readers wrapped in ExitableFilterAtomicReader
, and children wrapped with this new class (ExitableIndexReader
)?
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR. |
Description
IndexSearcher only checks the query timeout in the collection phase for now. Need to add timeout support in case of other operations that may take time such as query rewrite, point ranges and vector searches. In this PR, we are covering the case of query re-write
Solution
Added timeout support in case of query rewrite operation in IndexSearcher.
Tests
Added UT for testing timeout during query re-write operation
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.