Skip to content

Overrides rewrite in PointRangeQuery to optimize AllDocs/NoDocs cases #14609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ebradshaw
Copy link

Overrides rewrite in PointRangeQuery range to handle cases where the query either fully contains or fully excludes all documents within the shard.

Often, particularly when using time based partitioning, range queries may overlap several indexes. Many of these indexes have timestamp values that are fully contained by the query, in which case the query can be rewritten to a MatchAllDocsQuery or a FieldExistsQuery. On the other hand, many indexes can be fully excluded if they're outside the requested time range, in which case the query can be rewritten to a MatchNoDocsQuery.

While a similar optimization exists at the leaf level in the createWeight function, rewriting at the shard level enables other optimizations downstream.

Please let me know if this has been ruled out in the past for other reasons or if the implementation misses anything. Thanks.

…range either fully contains or fully excludes all documents within the shard.
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments but the change looks good to me in general. Thank you!

* @lucene.experimental
* @see PointValues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a rule around it, but the lucene.experimental / lucene.internal tag is usually the last one, so let's not swap these two lines?

}
} else if (fullyExcludedCount == dims) {
return new MatchNoDocsQuery();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can somehow reuse the existing relate(byte[], byte[]) method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using relate(byte[], byte[]). Actually looking at relate(byte[], byte[]) method, I realized that we don't need fullyExcludedCount == dims for rewriting into MatchNoDocsQuery. Any dimension being completely outside is sufficient condition for the query to be rewritten as MatchNoDocsQuery

fullyContainedCount++;
} else if (qLow.compareTo(gMax) > 0 || qHigh.compareTo(gMin) < 0) {
fullyExcludedCount++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return super.rewrite(searcher) if any dimension is not fullyContained / Excluded?

}
} else if (fullyExcludedCount == dims) {
return new MatchNoDocsQuery();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to using relate(byte[], byte[]). Actually looking at relate(byte[], byte[]) method, I realized that we don't need fullyExcludedCount == dims for rewriting into MatchNoDocsQuery. Any dimension being completely outside is sufficient condition for the query to be rewritten as MatchNoDocsQuery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants