Early terminate visit BKD leaf when current value greater than upper point in sorted dim. #12528

vsop-479 · 2023-08-30T07:25:01Z

Since values are sorted on the sorted dim in BKD leaf, early terminate the visit when current packed value greater than upper point in sorted dimension, may get a better performance.

Here are some performance data:

1 dim point

Visit method for low cardinality points:

baseline ns	candidate ns	speedup
52017	41400	20.4%

Whole search for low cardinality points:

baseline ns	candidate ns	speedup
242058	228225	5.7%

Whole search for high cardinality points:

baseline ns	candidate ns	speedup
984615	868358	11.8%

2 dim point

Whole search for low cardinality points:

baseline ns	candidate ns	speedup
1748439	1656944	5.2%

Whole search for high cardinality points:

baseline ns	candidate ns	speedup
1655624	1446875	12.6%

vsop-479 · 2023-09-20T08:23:14Z

@jpountz Please take a look when you get a chance!

lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java

lucene/core/src/java/org/apache/lucene/index/PointValues.java

vsop-479 · 2023-09-22T07:24:01Z

@iverase I replaced int values with static variables. Please take a look.
Actually, i used enum to define the match states in pre version. but it downgraded the performance a little.
Static variables is good, but do you think it is ok to use enum to make code graceful, even through there is little performance downgrade?

iverase · 2023-09-22T08:01:09Z

lucene/core/src/java/org/apache/lucene/index/PointValues.java

+    public static final int HIGH_IN_SORTED_DIM = 2;
+    /** Packed value is too high in NON-SORTED dimension */
+    public static final int HIGH_IN_NON_SORTED_DIM = 3;
+  }


My main concern here is that the concept of SORTED dimension does not exist in the PointValues API. If you have a look to the javadocs when visiting a leaf node:

/** * Called for all documents in a leaf cell that crosses the query. The consumer should * scrutinize the packedValue to decide whether to accept it. In the 1D case, values are visited * in increasing order, and in the case of ties, in increasing docID order. */

It only constraints the 1D case but in higher dimensions there is no constraint how data is visited. The concept of SORTED dimension sounds to me an implementation detail that should not be leaked to the public API.

lucene/core/src/java/org/apache/lucene/index/PointValues.java

iverase

I have concerns with this change as I have the impression that we are leaking implementation details into the public API which is not a good idea in general. Maybe other folks have a different opinion but I don't think the leaves sorted dimensions should be part of the public API, or at least we need to make it explicit before this change.

…ator, byte[]) to make other queries use DocIdSetIterator.

vsop-479 · 2023-09-25T07:01:30Z

@iverase Does it make sense to you if MatchState defined in other class, such as BKDReader or IntersectVisitor, and only leave the sorted dimension in IntersectVisitor' visit method as a parameter?
I just implemented visitWithState in PointRangeQuery, and there is some speed up. I think other queries, such as LatLonPointDistanceQuery, which use a similar IntersectVisitor, can get a better performance too.

iverase · 2023-09-25T11:02:44Z

My recommendation is to use the following method as you are just trying to flag if the visit method needs to keep processing points.

  /** Similar to {@link IntersectVisitor#visit(int, byte[])} but data is visited in 
   * increasing order on the {@sortedDim}, and in the case of ties, in increasing docID order. 
   * Implementers can stop processing points on the leaf by returning false when for example the 
   * sorted dimension value is too high to be matched by the query. 
   * 
   * @return true if the visitor should continue visiting points on this leaf, otherwise false. 
   * */

 default boolean visitWithSortedDim(int docID, byte[] packedValue, int sortedDim) throws IOException { 
       visit(docID, packedValue); 
        return true; 
   }

I would remove the "inverse" case as the inverse visitors are implementation details and IMHO should not be part of the API.

just implemented visitWithState in PointRangeQuery, and there is some speed up

We should run the performance test in the [lucene benchmarks](lucene benchmarks) to check if there are slowdowns due to the extra check for each visited point.

vsop-479 · 2023-09-26T01:44:07Z

@iverase Thanks for your recommendation, it may makes code more clear.
I will try to implement it, and run the performance test.

vsop-479 · 2023-10-10T09:30:07Z

@iverase Here is a performance test from luceneutil.

box-points	baseline	candidate	Diff
BEST M hits/sec	101.09	103.64	2.5%
BEST QPS	102.87	105.46	2.5%

I only implemented visitWithSortedDim in PointRangeQuery (without "inverse" case) temporarily, so I just ran the box's points test.

run the performance test in the [lucene benchmarks]

Do you mean the off the shelf benchmark in lucene self?

iverase · 2023-10-10T10:33:27Z

Do you mean the off the shelf benchmark in lucene self?

One case I am interested in is the geo benchmarks. It is not clear to me if some of those queries (e.g polygon query) can effectively take advantage of this change so it would be good to know that they are not slowed down.

vsop-479 · 2023-10-11T07:29:34Z

@iverase
Here is a performance data of geo cases. There are some slowdown due to the extra check.

query	metric	baseline	candidate	Diff
poly-10-geo3d	BEST M hits/sec	68.13	67.99	-0.2%
--	BEST QPS	43.07	42.99	-0.1%
box-geo3d	BEST M hits/sec	73.63	72.86	-0.1%
--	BEST QPS	74.92	74.14	-0.1%

I want to add a enhanced IntersectVisitor type or add a flag to IntersectVisitor, for queries which have already implemented visitWithSortedDim. And leave queries which haven't or couldn't implement visitWithSortedDim, use the original visit method to avoid the unnecessary checks.
Does it make sense to you?

vsop-479 · 2023-11-07T09:47:32Z

@iverase I implemented visitWithSortedDim in LatLonPointDistanceQuery. Please take a look when you get a chance!

BTW, The contents of getIntersectVisitor and getInverseIntersectVisitor in LatLonPointDistanceQuery and PointRangeQuery are identical. Should we move them to a DefaultIntersectVisitor and make match/relate abstract in it. I mean in a new PR without this change.

github-actions · 2024-01-08T12:23:45Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

vsop-479 · 2024-05-11T08:27:43Z

@iverase
I implemented visitWithSortedDim for pointCount in PointRangeQuery. Please take a look when you get a chance.

Or Do you have any other ideas to terminate visiting points, which already greater than upperPoint? Even just for 1D points.

github-actions · 2024-05-26T00:21:15Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

vsop-479 · 2024-07-18T06:13:49Z

if there are slowdowns due to the extra check for each visited point.

Maybe this extra check can be weakened by Branch predictor

github-actions · 2024-08-03T00:20:10Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

vsop-479 · 2025-05-12T12:47:01Z

I would remove the "inverse" case as the inverse visitors are implementation details and IMHO should not be part of the API.

FWIW, I implemented "inverse" case with adding a VisitState.

vsop-479 · 2025-05-12T12:48:11Z

I will try to measure the performance.

vsop-479 added 3 commits August 30, 2023 14:42

Early terminate visit bkd leaf for one dim point.

5b97476

Tidy.

4cbb76e

Early terminate for high cardinality BKD leaf.

b1ddd1a

vsop-479 changed the title ~~Early terminate visit low cardinality BKD leaf when current value greater than upper point for one dim point.~~ Early terminate visit BKD leaf when current value greater than upper point for one dim point. Sep 1, 2023

vsop-479 added 2 commits September 2, 2023 08:52

Early terminate for inverse visitor.

82550bf

Early terminate for N dims point.

14a8bee

vsop-479 changed the title ~~Early terminate visit BKD leaf when current value greater than upper point for one dim point.~~ Early terminate visit BKD leaf when current value greater than upper point in sorted dim. Sep 5, 2023

iverase reviewed Sep 21, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java Show resolved Hide resolved

iverase reviewed Sep 21, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/index/PointValues.java Outdated Show resolved Hide resolved

Use static variables instead of int values.

8304c15

iverase reviewed Sep 22, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/index/PointValues.java Outdated Show resolved Hide resolved

iverase reviewed Sep 22, 2023

View reviewed changes

vsop-479 added 3 commits September 22, 2023 16:41

Use static variables instead of remain int values.

b465046

Invoke visit(DocIdSetIterator, byte[]) in visitWithState(DocIdSetIter…

d713d96

…ator, byte[]) to make other queries use DocIdSetIterator.

Modify some comments.

9c24eeb

Visit with sorted dim.

b85a9b2

Implement visitWithSortedDim in LatLonPointDistanceQuery

5584de2

Tidy.

2066e4f

github-actions bot added the Stale label Jan 8, 2024

Merge branch 'main' into early_terminate_visit_bkd_leaf

915ad37

github-actions bot removed the Stale label May 11, 2024

Implemented visitWithSortedDim for pointCount in PointRangeQuery.

5c7e73d

github-actions bot added the Stale label May 26, 2024

github-actions bot removed the Stale label Jul 19, 2024

github-actions bot added the Stale label Aug 3, 2024

Merge branch 'main' into early_terminate_visit_bkd_leaf

feac354

github-actions bot added module:core/index module:core/search module:test-framework labels May 6, 2025

update.

bf32dd1

vsop-479 requested a review from iverase May 6, 2025 09:33

github-actions bot removed the Stale label May 7, 2025

vsop-479 added 7 commits May 9, 2025 20:04

Add VisitState to implement inverse visit with sorted dim.

8868f86

Iter.

de65346

Leave compared doc/iterator to remaining.

b3f8c59

Change MatchState to enum.

1e6e690

Revert PointRangeQuery to resolve conflicts.

4f69bea

Merge branch 'main' into early_terminate_visit_bkd_leaf

107af45

Implement visit with sorted dim.

a148a69

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early terminate visit BKD leaf when current value greater than upper point in sorted dim. #12528

Early terminate visit BKD leaf when current value greater than upper point in sorted dim. #12528

vsop-479 commented Aug 30, 2023 •

edited

Loading

vsop-479 commented Sep 20, 2023

vsop-479 commented Sep 22, 2023

iverase Sep 22, 2023

iverase left a comment

vsop-479 commented Sep 25, 2023 •

edited

Loading

iverase commented Sep 25, 2023

vsop-479 commented Sep 26, 2023

vsop-479 commented Oct 10, 2023

iverase commented Oct 10, 2023

vsop-479 commented Oct 11, 2023

vsop-479 commented Nov 7, 2023 •

edited

Loading

github-actions bot commented Jan 8, 2024

vsop-479 commented May 11, 2024

github-actions bot commented May 26, 2024

vsop-479 commented Jul 18, 2024

github-actions bot commented Aug 3, 2024

vsop-479 commented May 12, 2025 •

edited

Loading

vsop-479 commented May 12, 2025

Early terminate visit BKD leaf when current value greater than upper point in sorted dim. #12528

Are you sure you want to change the base?

Early terminate visit BKD leaf when current value greater than upper point in sorted dim. #12528

Conversation

vsop-479 commented Aug 30, 2023 • edited Loading

1 dim point

2 dim point

vsop-479 commented Sep 20, 2023

vsop-479 commented Sep 22, 2023

iverase Sep 22, 2023

Choose a reason for hiding this comment

iverase left a comment

Choose a reason for hiding this comment

vsop-479 commented Sep 25, 2023 • edited Loading

iverase commented Sep 25, 2023

vsop-479 commented Sep 26, 2023

vsop-479 commented Oct 10, 2023

iverase commented Oct 10, 2023

vsop-479 commented Oct 11, 2023

vsop-479 commented Nov 7, 2023 • edited Loading

github-actions bot commented Jan 8, 2024

vsop-479 commented May 11, 2024

github-actions bot commented May 26, 2024

vsop-479 commented Jul 18, 2024

github-actions bot commented Aug 3, 2024

vsop-479 commented May 12, 2025 • edited Loading

vsop-479 commented May 12, 2025

vsop-479 commented Aug 30, 2023 •

edited

Loading

vsop-479 commented Sep 25, 2023 •

edited

Loading

vsop-479 commented Nov 7, 2023 •

edited

Loading

vsop-479 commented May 12, 2025 •

edited

Loading