Reduce the number of comparisons when lowerPoint is equal to upperPoint #14267

hanbj · 2025-02-21T05:21:51Z

Description

When lowerPoint is equal to upperPoint. In fact, there is no need to compare lowerPoint and upperPoint at the same time. The number of comparisons can be reduced by half when collecting document ids to construct bitsets and deeply traversing the bkd tree.
During my reading of Elasticsearch related code, I found that when executing term or terms queries on the date field, the query is rewritten, and a single term is rewritten as a rang query (lowerTerm is the same as upperTerm). The terms query uses BooleanQuery to encapsulate multiple range queries. Therefore, it is more suitable for this scenario, reducing the number of comparisons and improving performance when collecting a large number of documents.

stefanvodita

Sorry it's taken a while to review. Have you checked whether the failing test case is related? Do we have any performance tests?

Edit: Retry succeeded, but it's still worth checking that previous run. I haven't had a look at it yet.

hanbj · 2025-03-12T07:48:42Z

@stefanvodita The previous failed test case was org.apache.Lucene.index TestKnnGraph.testMultiThreadedSearch.
I have confirmed the testMultiThreadedSearch method, which uses KnnFloatVectorQuery for search and does not use PointRangeQuery, so it is not relevant to this range query.

github-actions · 2025-03-28T00:24:31Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

jainankitk

Thanks @hanbj for creating this PR. Can you make suggested refactoring to avoid any potential regression for the lowerPoint != upperPoint code path, which is also more common? Can you also add unit test for this new code path?

jainankitk · 2025-03-31T21:32:43Z

lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java

+        if (equalValues) {
+          for (int dim = 0; dim < numDims; dim++, offset += bytesPerDim) {
+            if (comparator.compare(packedValue, offset, lowerPoint, offset) != 0) {
+              return false;
+            }
+          }
+          return true;
+        }


Given we know about equalValues being true/false while initializing the PointRangeQuery, I would have a separate weight object, instead of having this additional logic when lowerPoint != upperPoint. For example :

@Override public final Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { if (this.equalValues) { return new ConstantScoreWeight(this, boost) {....} } // We don't use RandomAccessWeight here: it's no good to approximate with "match all docs". // This is an inverted structure and should be used in the first pass: return new ConstantScoreWeight(this, boost) {....} }

I will implement it first. Thank you for your hard work in the review.

jainankitk

Thanks @hanbj for refactoring the code and adding the unit test. Looks much better now. Just a minor suggestion to improve the readability further.

jainankitk · 2025-04-01T16:24:26Z

lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java

+   * Essentially, it is to reduce the number of comparisons. This is an optimization, used for the
+   * case of lowerPoint==upperPoint.
+   */
+  protected class SinglePointConstantScoreWeight extends MultiPointsConstantScoreWeight {


I am assuming we are reusing some of the methods from MultiPointsConstantScoreWeight. That's why we are extending from that class. May, I suggest creating class say PointRangeQueryWeight that extends from ConstantScoreWeight? And, both SinglePointRangeQueryWeight and MultiPointRangeQueryWeight extend from PointRangeQueryWeight?

This suggestion is great, SinglePointRangeQueryWeight and MultiplaPointRangeQueryWeight only need to implement their own point value matching logic and relationship judgment.

jainankitk · 2025-04-02T16:32:43Z

@hanbj - Thanks for patiently addressing the review comments. While I don't see any performance regression risk myself, I am wondering if we can do one quick performance benchmark run, just to ensure we are not missing anything obvious?

gsmiller

Thanks for proposing this! I didn't review in deep detail but commented on a couple API/visibility related concerns that jumped out to me.

Also, +1 to running some benchmarks on this before merging to ensure we're not regressing current behavior. Adding this polymorphic indirection may actually hurt performance in interesting, non-obvious ways and we should verify this is actually beneficial.

As an alternative direction, I'd also be curious how PointInSetQuery with a single point performs. A really simple thing to try would be to change the query factory methods (e.g., LongPoint#newExactQuery) to build a set query instead of range if it's better. Or another option could be to create a specialized query that does an exact point comparison. Benchmarks would be a great help in figuring out what of these approaches is best.

Thanks again!

gsmiller · 2025-04-02T20:58:43Z

lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java

+   * <p>Optimize query performance by reducing the number of comparisons between dimensions. This
+   * implementation is used when the upper and lower bounds of all dimensions are exactly the same.
+   */
+  protected class SinglePointRangeQueryWeight extends PointRangeQueryWeight {


Does this actually need to be protected instead of private? (Same question for MultiPointRangeQueryWeight and PointRangeQueryWeight).

Yeah, they can be private in my opinion as well.

While MultiPointRangeQueryWeight and SinglePointRangeQueryWeight are used for asserting the weight instance type, PointRangeQueryWeight can be made private

gsmiller · 2025-04-02T21:01:10Z

lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java

@@ -517,6 +623,11 @@ public byte[] getUpperPoint() {
    return upperPoint.clone();
  }

+  // for test
+  public boolean isEqualValues() {


I'd really prefer we do not increase our public API surface area only for testing on a class like this. Can we find another way to test without exposing this please?

Good catch @gsmiller! Can we make this package-private?

hanbj · 2025-04-09T06:21:00Z

@gsmiller @jainankitk I haven't carefully studied the implementation of benchmark sin Lucene too, which may take sometime.

jainankitk · 2025-04-09T17:51:21Z

@gsmiller @jainankitk I haven't carefully studied the implementation of benchmark sin Lucene too, which may take sometime.

@hanbj - Please take your time. You can follow the instructions here - https://github.com/mikemccand/luceneutil/blob/main/README.md. It is fairly straightforward, did it recently for another PR, coincidentally in PointRangeQuery only. Please let me know if you get stuck anywhere

github-actions · 2025-04-24T00:25:07Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

jainankitk · 2025-05-07T22:47:09Z

Closing in favor of linked PR #14625 that addresses review comments with performance benchmark results

github-actions bot added the module:core/search label Feb 21, 2025

stefanvodita reviewed Mar 7, 2025

View reviewed changes

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Mar 12, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Mar 12, 2025

github-actions bot added the Stale label Mar 28, 2025

jainankitk reviewed Mar 31, 2025

View reviewed changes

github-actions bot removed the Stale label Apr 1, 2025

hanbj force-pushed the single_value_range branch 2 times, most recently from 50ded47 to 699c280 Compare April 1, 2025 10:13

jainankitk reviewed Apr 1, 2025

View reviewed changes

hanbj added 4 commits April 2, 2025 14:41

Reduce the number of comparisons when lowerPoint is equal to upperPoint

51ec0e9

code format and add test

ee97cd3

add change

ae3bfb6

code refactoring

471179e

hanbj force-pushed the single_value_range branch from 699c280 to 471179e Compare April 2, 2025 06:48

jainankitk approved these changes Apr 2, 2025

View reviewed changes

gsmiller reviewed Apr 2, 2025

View reviewed changes

github-actions bot added the Stale label Apr 24, 2025

jainankitk mentioned this pull request May 7, 2025

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14625

Open

jainankitk closed this May 7, 2025

github-project-automation bot moved this from Open to Closed in OpenSearch Lucene & Core Performance Tracking May 7, 2025

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14267

Reduce the number of comparisons when lowerPoint is equal to upperPoint #14267

Uh oh!

Conversation

hanbj commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

stefanvodita left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanbj commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 28, 2025

Uh oh!

jainankitk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanbj Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jainankitk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jainankitk commented Apr 2, 2025

Uh oh!

gsmiller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hanbj commented Apr 9, 2025

Uh oh!

jainankitk commented Apr 9, 2025

Uh oh!

github-actions bot commented Apr 24, 2025

Uh oh!

jainankitk commented May 7, 2025

Uh oh!

Uh oh!

hanbj commented Feb 21, 2025 •

edited

Loading

stefanvodita left a comment •

edited

Loading

hanbj commented Mar 12, 2025 •

edited

Loading

hanbj Apr 1, 2025 •

edited

Loading