-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Overrides rewrite in PointRangeQuery to optimize AllDocs/NoDocs cases #14609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…range either fully contains or fully excludes all documents within the shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments but the change looks good to me in general. Thank you!
* @lucene.experimental | ||
* @see PointValues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a rule around it, but the lucene.experimental / lucene.internal tag is usually the last one, so let's not swap these two lines?
} | ||
} else if (fullyExcludedCount == dims) { | ||
return new MatchNoDocsQuery(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can somehow reuse the existing relate(byte[], byte[])
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using relate(byte[], byte[])
. Actually looking at relate(byte[], byte[])
method, I realized that we don't need fullyExcludedCount == dims
for rewriting into MatchNoDocsQuery
. Any dimension being completely outside is sufficient condition for the query to be rewritten as MatchNoDocsQuery
fullyContainedCount++; | ||
} else if (qLow.compareTo(gMax) > 0 || qHigh.compareTo(gMin) < 0) { | ||
fullyExcludedCount++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can return super.rewrite(searcher)
if any dimension is not fullyContained / Excluded?
} | ||
} else if (fullyExcludedCount == dims) { | ||
return new MatchNoDocsQuery(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using relate(byte[], byte[])
. Actually looking at relate(byte[], byte[])
method, I realized that we don't need fullyExcludedCount == dims
for rewriting into MatchNoDocsQuery
. Any dimension being completely outside is sufficient condition for the query to be rewritten as MatchNoDocsQuery
Overrides rewrite in PointRangeQuery range to handle cases where the query either fully contains or fully excludes all documents within the shard.
Often, particularly when using time based partitioning, range queries may overlap several indexes. Many of these indexes have timestamp values that are fully contained by the query, in which case the query can be rewritten to a MatchAllDocsQuery or a FieldExistsQuery. On the other hand, many indexes can be fully excluded if they're outside the requested time range, in which case the query can be rewritten to a MatchNoDocsQuery.
While a similar optimization exists at the leaf level in the createWeight function, rewriting at the shard level enables other optimizations downstream.
Please let me know if this has been ruled out in the past for other reasons or if the implementation misses anything. Thanks.