Skip to content

LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches #12345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

Deepika0510
Copy link
Contributor

Description

IndexSearcher only checks the query timeout in the collection phase for now. Need to add timeout support in case of other operations that may take time such as query rewrite, point ranges and vector searches. In this PR, we are covering the case of query re-write

Solution

Added timeout support in case of query rewrite operation in IndexSearcher.

Tests

Added UT for testing timeout during query re-write operation

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

@@ -763,6 +763,11 @@ public Query rewrite(Query original) throws IOException {
for (Query rewrittenQuery = query.rewrite(this);
rewrittenQuery != query;
rewrittenQuery = query.rewrite(this)) {
if (queryTimeout != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement, however, it's only checking at the very end of rewrite? Which is not so different from checking at the start of iterating through the hits? I.e. in practice it seems like it won't make much of an improvement?

I wonder if we can somehow check the timeout during rewrite? Seems tricky though ...

The timeout implementation that wraps Directory can achieve that by checking during I/O.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to wrap the directory similarly to what ExitableDirectoryReader does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mikemccand and @jpountz!

@jpountz I didn't get the idea behind wrapping Directory here. The rewrite method produce a more efficient query for search, I wonder how it is related to Directory class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deepika0510 I think what @jpountz meant is we should somehow intercept the I/O Lucene is inevitably doing in a rewrite implementation to check for timeout. Similar to how ExitableDirectoryReader wraps the low level Lucene APIs and checks for timeout.

But I don't see how we could cleanly do this (wrap Directory) in the context of rewrite.

Maybe instead we just use ExitableDirectoryReader, but make a new generalized version ExitableIndexReader that takes any IndexReader (not just the DirectoryReader implementation)?

Then, if a timeout is set on IndexSearcher, we use this wrapped timeout IndexReader in rewrite somehow? We should maybe create that wrapped reader once on IndexSearcher.setTimeout calls and reuse it across all concurrent rewrites?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, sorry for missing your question. @mikemccand 's understanding is correct, something like ExitableDirectoryReader. +1 to @mikemccand 's suggestion:

  • Implement a package-private ExitableIndexReader, which is really the same as ExitableDirectoryReader but for any IndexReader. It should wrap terms and points, but not postings and doc values (which are already covered via the bulk scorer).
  • Update IndexSearcher#getIndexReader to return the wrapped index reader. This will automatically make sure expensive rewrites are intercepted since IndexSearcher#getIndexReader is their only way to get access to the reader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I just noticed this. ExitableDirectoryReader.ExitableFilterAtomicReader does indeed already have timeout checking for searching vectors. But it only has it for float[]. I am guessing something was missed during the byte[] rewrite and update. I am going to add support for timeout checking for byte[] in that class.

Just in case you get weird merge conflicts while working through this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mikemccand and @jpountz for suggestion!

I was going through the code and came across the below scenario. We want to wrap terms and points, but the related methods are in LeafReader class and we access these method generally using the getContext().leaves().get(0).reader().<method>. 



E.g.


IndexReaderContext topReaderContext = reader.getContext();
for (LeafReaderContext context : topReaderContext.leaves()) {
  final Terms terms = context.reader().terms(query.field);
   ……
   ……
   ……
 }

So, even if we wrap LeafReader and update IndexSearcher#getIndexReader() to return the wrapped reader, we would not be able to return wrapped class at such above point where we get the reader using ReaderContext .

Maybe one way out would be to create a separate ReaderContext such that it returns our wrapped IndexReader?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work automatically if you create an IndexReader wrapper. See how CompositeReader#getContext is implemented.

Copy link
Contributor Author

@Deepika0510 Deepika0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz, @mikemccand
I have created a wrapper class of IndexReader and tried to make minimal changes in other files. I added logic in CompositeReaderContext to handle ExitableIndexReader and needed to handle closing of inner wrapped IndexReader to avoid thread leaks.

@jpountz
Copy link
Contributor

jpountz commented Jul 20, 2023

@Deepika0510 You don't only need to wrap the IndexReader, you also need to wrap all its leaves.

@Deepika0510
Copy link
Contributor Author

@jpountz To wrap all the leaves, we would need to wrap ReaderContext classes along with LeafReader classes as well right? Since, we generally access the leaves through the ReaderContext class. Something like this:

reader().getContext().leaves().reader()

@mikemccand
Copy link
Member

I don't think you need to wrap ReaderContext classes -- you can create your new TimeoutLeafReader class, subclassing FilterLeafReader, and overriding the methods (likely with additional wrapping on their returned objects?) that are needing added timeouts.

Then a call lke reader().getContext().leaves().reader() will return your TimeoutLeafReader but it will implement all necessary functionality of a LeafReader.

@Deepika0510
Copy link
Contributor Author

However, to get that TimeoutLeafReader in use, we would need to go through the ReaderContext class route(?). In a way we would need some mechanism in ReaderClass to know if timeout is applied and then return TimeoutLeafReader object.

@mikemccand
Copy link
Member

Hmm I'm confused: why would you need to get to the TimeoutLeafReader? Don't you create this timeout reader, passing the timeout to it (which will apply to all queries) and then you don't need to get to it anymore? Just catch the timeout exception when running searching?

@Deepika0510
Copy link
Contributor Author

What I meant to ask is that after creating the TimeoutLeafReader class, how would we make sure that this wrapped class's object is used instead of any normal LeafReader instance?

E.g. when we created wrapped IndexReader then we modified the getIndexReader() method such that if timeout is applied then we make sure that we return the wrapped IndexReader object.

Similarly, we have to ensure the same for the TimeoutLeafReader as well right? That is where my doubt lies that since LeafReader is accessed through indexReader.getContext().leaves().reader() (like here) then shouldn’t we need to intercept in between this to return the object of TimeoutLeafReader?

@Deepika0510
Copy link
Contributor Author

Came across SoftDeletesDirectoryReaderWrapper where we have wrap method for wrapping underlying LeafReader.
However, I believe we will still have the problem when the leaves are directly accessed through ctx object like here. Is there any other way around? @mikemccand @jpountz

@mikemccand
Copy link
Member

Hi @Deepika0510 -- what is the problem when callers access the leaves? Since you would subclass FilterLeafReader (which subclasses LeafReader) it should be fine to existing code? Like that line you linked to above will still be able to call .maxDoc().

Copy link
Contributor Author

@Deepika0510 Deepika0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mikemccand @jpountz . I looked into SoftDeletesDirectoryReaderWrapper and tried wrapping leafReader the way we are wrapping SubReader in doWrapDirectoryReader. I was able to wrap leadReader and we are able to enforce time out.

Copy link
Contributor Author

@Deepika0510 Deepika0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix failing gradle check.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking closer! Thanks @Deepika0510.

for (LeafReaderContext leafCtx : leaves) {
LeafReader reader = leafCtx.reader();
readers.add(reader);
// we try to reuse the life docs instances here if the reader cache key didn't change
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

life -> live

* TimeoutLeafReader is wrapper class for FilterLeafReader which is imposing timeout on different
* operations of FilterLeafReader
*/
public static class TimeoutLeafReader extends FilterLeafReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be public?

*/
protected TimeoutLeafReader(LeafReader in, QueryTimeout queryTimeout) {
super(in);
if (in == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could instead / more compactly do:

  super(Objects.requireNonNull(in));

I think?

private static QueryTimeout countingQueryTimeout(int timeallowed) {

return new QueryTimeout() {
static int counter = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need = 0 -- it's already Java's default.

directory.close();
}

private static QueryTimeout countingQueryTimeout(int timeallowed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to timeAllowed?

@@ -46,9 +47,13 @@ public class StringDocValuesReaderState {
* (e.g., to pickup NRT updates) requires constructing a new state instance.
*/
public StringDocValuesReaderState(IndexReader reader, String field) throws IOException {
this.reader = reader;
if (reader instanceof ExitableIndexReader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this here? Caller should not be using a ExitableDirectoryReader when constructing this state?

@@ -372,6 +373,9 @@ public static LeafSlice[] slices(

/** Return the {@link IndexReader} this searches. */
public IndexReader getIndexReader() {
if (queryTimeout != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this seems dangerous -- getters shouldn't be doing magical / surprising things I think? Could we instead require/expect caller to creating the IndexSearcher with the ExitableDirectoryReader?

Edit: or perhaps when IndexSearcher calls rewrite, specifically, if there is a queryTimeout, it could wrap the reader at that point, only?

@@ -147,7 +149,10 @@ public void testRandomSampling() throws Exception {
Math.min(5 * sampled.value.floatValue(), numDocs / 10.f),
1.0);
}

IOUtils.close(searcher.getIndexReader(), taxoReader, dir, taxoDir);
IndexReader r = searcher.getIndexReader();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do this either? In general all code using IndexReader should not need to specially check for ExitableDirectoryReader. Rather, callers should not pass ExitableDirectoryReader to these places.

if (queryTimeout.shouldExit()) {
throw new ExitableIndexReader.TimeExceededException();
}
return in.terms(field);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think you need to further wrap here? Likely the timeout will not have been hit when the rewrite first starts. That rewrite first calls .terms() for each segment, and then does the hard work of getting the .iterator() from it and stepping / seeking through the resulting TermsEnum.

So I think you would need to return a ExitableTerms, which in turn returns an ExitableTermsEnum, etc.?

And perhaps same thing for points, doc values, though we could defer those for a followon change?

@Deepika0510 Deepika0510 requested a review from jpountz December 5, 2023 11:58
Copy link
Contributor Author

@Deepika0510 Deepika0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mikemccand for reviewing the PR. I have addressed the comment. Please have a look.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Deepika0510 -- this looks closer! I left a few comments.

}
ExitableSubReaderWrapper exitableSubReaderWrapper =
new ExitableSubReaderWrapper(readerCache, queryTimeout);
exitableSubReaderWrapper.wrap(readers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm shouldn't we do something with the returned wrap'd readers?

// creating a new one
return mapping.get(readerCacheHelper.getKey());
}
return new TimeoutLeafReader(reader, queryTimeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we put this new wrap'd reader into the mapping too?

public ExitableIndexReader(IndexReader indexReader, QueryTimeout queryTimeout) {
this.indexReader = indexReader;
this.queryTimeout = queryTimeout;
doWrapIndexReader(indexReader, queryTimeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return something and maybe you do this.indexReader = doWrapIndexReader(...); or so?

* Throws {@link ExitableDirectoryReader.ExitingReaderException} if {@link
* QueryTimeout#shouldExit()} returns true, or if {@link Thread#interrupted()} returns true.
*/
private void checkTimeoutWithSampling() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why is this named WithSampling? Is it supposed to try to check less often since this might be a hot spot? But it seems to just check every time it's called.

if (queryTimeout.shouldExit()) {
throw new ExitableIndexReader.TimeExceededException();
}
return in.getPointValues(field);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wrap points as well? Or we can open a follow-on issue. Timeout during points intersect may not be so important? Not sure ...

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 17, 2024
Copy link
Contributor Author

@Deepika0510 Deepika0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mikemccand! I have made the required changes, please have a look.

@github-actions github-actions bot removed the Stale label Jan 23, 2024
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Deepika0510 -- thank you for persisting here!

I added a bunch of comments. Lucene's IndexReader hierarchy is super tricky and I am not even certain about the comments I wrote. But the big picture is I think you can share some of the wrapper classes directly from ExitableDirectoryReader, and then also fork small parts of that class into ExitableIndexReader while switching DirectoryReader to IndexReader?

*/
public ExitableIndexReader(IndexReader indexReader, QueryTimeout queryTimeout)
throws IOException {
this.indexReader = new ExitableIndexReaderWrapper((DirectoryReader) indexReader, queryTimeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think we should not be casting incoming indexReader to a DirectoryReader? The idea of this new class is to enforce timeouts for any IndexReader, not just DirectoryReader (which is a subclass of IndexReader), I think?

In fact, you might be able to start by copying ExitableDirectoryReader.java to ExitableIndexReader.java and then change all DirectoryReader to IndexReader and iterate from there? This way we would also get points timeout implemented. I realize this is all quite tricky!!


/** Thrown when elapsed search time exceeds allowed search time. */
@SuppressWarnings("serial")
static class TimeExceededException extends RuntimeException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have both ExitableDirectoryReader and this class to use a single class (the existing ExitingReaderException) when timeout happens?


@Override
public StoredFields storedFields() throws IOException {
if (queryTimeout.shouldExit()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we factor out a private helper method that does this if and the throw? Just like ExitableDirectoryReader.checkAndThrow.


@Override
public int docFreq(Term term) throws IOException {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove this newline at the top of each of these methods?

return indexReader.getSumTotalTermFreq(field);
}

private static class ExitableIndexReaderWrapper extends FilterDirectoryReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be subclassing FilterDirectoryReader here since we want to work with any IndexReader.

* TimeoutLeafReader is wrapper class for FilterLeafReader which is imposing timeout on
* different operations of FilterLeafReader
*/
private static class TimeoutLeafReader extends FilterLeafReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could reuse the existing ExitableDirectoryReader.ExitableFilterAtomicReader class, instead of making a new class here, maybe? And that class already implements timeouts for points, postings, doc values, etc.


private final CacheHelper readerCacheHelper;

public ExitableIndexReaderWrapper(DirectoryReader in, QueryTimeout queryTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe reuse the existing ExitableDirectoryReader.ExitableSubReaderWrapper instead of making a new one here?

@@ -77,7 +78,11 @@ public void testBasicSingleValued() throws Exception {
new StringDocValuesReaderState(searcher.getIndexReader(), "field");
checkTopNFacetResult(expectedCounts, expectedTotalDocCount, searcher, state, 10, 2, 1, 0);

IOUtils.close(searcher.getIndexReader(), dir);
IndexReader r = searcher.getIndexReader();
if (r instanceof ExitableIndexReader) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why was this needed? In general users of ExitableIndexReader should not have to special case which exact instance of IndexReader they have. In this case, maybe ExitableIndexReader is not implementing close correctly?

@Override
public IndexReaderContext getContext() {

return indexReader.getContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't quite right -- we can't just return the underlying IndexReaderContext. I think instead you need to get this underlying context, and then make a new CompositeReaderContext that wraps each of the leaf readers wrapped in ExitableFilterAtomicReader, and children wrapped with this new class (ExitableIndexReader)?

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants