Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. #14701

jpountz · 2025-05-22T11:46:24Z

MaxScoreBulkScorer and BlockMaxConjunctionBulkScorer currently evaluate hits in a doc-at-a-time (DAAT) fashion, meaning that they they look at all their clauses to find the next doc and so forth until all docs from the window are evaluated. This changes evaluation to run in a more term-at-a-time fashion (TAAT) within scoring windows, meaning that each clause is fully evaluated within the window before moving on to the next clause.

Note that this isn't completely new, BooleanScorer has been doing this to exhaustively evaluate disjunctive queries, by loading their matches into a bit set, one clause at a time. Also note that this is a bit different from traditional TAAT as this is scoped to small-ish windows of doc IDs, not the entire doc ID space.

This in-turn allows these scorers to take advantage of the new Scorer#nextDocsAndScores API, and provides a good speedup. A downside is that we may need to perform more memory copying in some cases, and evaluate a bit more documents, but the change still looks like a win in general.

…-time fashion. `MaxScoreBulkScorer` and `BlockMaxConjunctionBulkScorer` currently evaluate hits in a doc-at-a-time (DAAT) fashion, meaning that they they look at all their clauses to find the next doc and so forth until all docs from the window are evaluated. This changes evaluation to run in a more term-at-a-time fashion (TAAT) within scoring windows, meaning that each clause is fully evaluated within the window before moving on to the next clause. Note that this isn't completely new, `BooleanScorer` has been doing this to exhaustively evaluate disjunctive queries, by loading their matches into a bit set, one clause at a time. Also note that this is a bit different from traditional TAAT as this is scoped to small-ish windows of doc IDs, not the entire doc ID space. This in-turn allows these scorers to take advantage of the new `Scorer#nextDocsAndScores` API, and provides a good speedup. A downside is that we may need to perform more memory copying in some cases, and evaluate a bit more documents, but the change still looks like a win in general.

github-actions · 2025-05-22T11:47:20Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

jpountz · 2025-05-22T11:49:36Z

Filtered queries get a slowdown, but some important queries get a big speedup:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             FilteredAndHighHigh       65.66      (1.9%)       59.10      (4.6%)  -10.0% ( -16% -   -3%) 0.000
            FilteredAndStopWords       45.10      (2.4%)       41.04      (3.9%)   -9.0% ( -14% -   -2%) 0.000
               FilteredAnd3Terms      181.24      (1.7%)      165.29      (3.2%)   -8.8% ( -13% -   -3%) 0.000
     FilteredAnd2Terms2StopWords      178.47      (1.3%)      162.78      (2.9%)   -8.8% ( -12% -   -4%) 0.000
                          Fuzzy1      102.41      (2.9%)       98.67      (3.2%)   -3.6% (  -9% -    2%) 0.000
             FilteredOrStopWords       46.85      (1.6%)       45.27      (2.5%)   -3.4% (  -7% -    0%) 0.000
              FilteredOrHighHigh       68.45      (0.9%)       66.49      (1.9%)   -2.9% (  -5% -    0%) 0.000
                          Fuzzy2       85.94      (2.6%)       83.58      (2.8%)   -2.7% (  -7% -    2%) 0.002
                      OrHighRare      291.53      (3.3%)      283.82      (6.1%)   -2.6% ( -11% -    6%) 0.096
              FilteredAndHighMed      126.38      (2.6%)      123.62      (5.5%)   -2.2% ( -10% -    6%) 0.119
                      DismaxTerm      486.16      (2.2%)      478.90      (3.2%)   -1.5% (  -6% -    3%) 0.089
               FilteredOrHighMed      153.91      (0.8%)      151.71      (1.3%)   -1.4% (  -3% -    0%) 0.000
      FilteredOr2Terms2StopWords      148.64      (1.1%)      146.57      (1.5%)   -1.4% (  -3% -    1%) 0.001
             CombinedAndHighHigh       11.47      (2.6%)       11.33      (2.3%)   -1.2% (  -5% -    3%) 0.132
                FilteredOr3Terms      167.24      (1.1%)      165.61      (1.6%)   -1.0% (  -3% -    1%) 0.030
                        Wildcard       94.57      (2.8%)       93.67      (2.7%)   -1.0% (  -6% -    4%) 0.281
              CombinedOrHighHigh       18.94      (1.8%)       18.80      (2.1%)   -0.7% (  -4% -    3%) 0.243
                            Term      439.45      (3.6%)      436.34      (4.5%)   -0.7% (  -8% -    7%) 0.595
                      TermDTSort      389.67      (4.7%)      386.92      (4.8%)   -0.7% (  -9% -    9%) 0.647
                    FilteredTerm      159.78      (1.6%)      158.69      (2.3%)   -0.7% (  -4% -    3%) 0.282
                 AndHighOrMedMed       46.65      (1.1%)       46.40      (1.8%)   -0.5% (  -3% -    2%) 0.267
                        PKLookup      320.16      (4.3%)      318.93      (4.7%)   -0.4% (  -8% -    8%) 0.791
                 FilteredPrefix3      154.48      (2.3%)      154.23      (2.6%)   -0.2% (  -4% -    4%) 0.840
                          Phrase       14.34      (2.2%)       14.32      (2.3%)   -0.1% (  -4% -    4%) 0.858
                          IntNRQ      309.51      (0.8%)      309.13      (0.9%)   -0.1% (  -1% -    1%) 0.641
                         Prefix3      163.72      (2.3%)      163.54      (2.7%)   -0.1% (  -4% -    4%) 0.893
                 CountAndHighMed      312.21      (1.2%)      312.08      (1.6%)   -0.0% (  -2% -    2%) 0.929
                IntervalsOrdered        2.29      (2.9%)        2.29      (2.5%)    0.1% (  -5% -    5%) 0.919
                  FilteredIntNRQ      301.55      (1.0%)      301.84      (0.9%)    0.1% (  -1% -    2%) 0.760
             CountFilteredOrMany       27.46      (1.0%)       27.50      (1.3%)    0.1% (  -2% -    2%) 0.754
          CountFilteredOrHighMed      148.19      (0.7%)      148.39      (0.6%)    0.1% (  -1% -    1%) 0.509
                     CountOrMany       30.46      (1.5%)       30.52      (1.5%)    0.2% (  -2% -    3%) 0.712
                       CountTerm     8687.88      (2.8%)     8708.11      (4.5%)    0.2% (  -6% -    7%) 0.850
         CountFilteredOrHighHigh      136.50      (0.7%)      136.90      (0.6%)    0.3% (  -1% -    1%) 0.178
                CountAndHighHigh      359.19      (1.7%)      360.43      (1.6%)    0.3% (  -2% -    3%) 0.520
                  CountOrHighMed      366.72      (1.5%)      368.05      (1.8%)    0.4% (  -2% -    3%) 0.486
                    CombinedTerm       30.67      (3.5%)       30.78      (4.0%)    0.4% (  -6% -    8%) 0.751
                 CountOrHighHigh      346.98      (1.1%)      348.48      (1.5%)    0.4% (  -2% -    3%) 0.304
                  FilteredPhrase       32.93      (1.8%)       33.07      (1.5%)    0.4% (  -2% -    3%) 0.413
             CountFilteredPhrase       25.29      (3.2%)       25.50      (2.5%)    0.8% (  -4% -    6%) 0.377
                   TermTitleSort       85.82     (10.1%)       86.55      (7.5%)    0.9% ( -15% -   20%) 0.767
                   TermMonthSort     3181.64      (2.9%)     3213.86      (2.2%)    1.0% (  -3% -    6%) 0.227
                  FilteredOrMany       16.50      (2.7%)       16.67      (2.5%)    1.0% (  -4% -    6%) 0.221
                     CountPhrase        4.15      (3.2%)        4.20      (2.0%)    1.3% (  -3% -    6%) 0.137
               TermDayOfYearSort      283.82      (3.8%)      287.84      (1.8%)    1.4% (  -4% -    7%) 0.146
               CombinedOrHighMed       72.56      (2.4%)       73.69      (1.8%)    1.6% (  -2% -    5%) 0.024
                          OrMany       20.45      (5.3%)       21.05      (6.0%)    2.9% (  -8% -   15%) 0.114
              CombinedAndHighMed       38.55      (2.3%)       39.92      (2.2%)    3.6% (   0% -    8%) 0.000
                DismaxOrHighHigh      118.14      (2.1%)      122.80      (4.3%)    3.9% (  -2% -   10%) 0.000
                 DismaxOrHighMed      173.28      (2.3%)      183.27      (2.2%)    5.8% (   1% -   10%) 0.000
                        Or3Terms      165.74      (5.5%)      197.36      (7.9%)   19.1% (   5% -   34%) 0.000
                       And3Terms      172.23      (4.2%)      205.61      (7.9%)   19.4% (   7% -   32%) 0.000
              Or2Terms2StopWords      159.77      (5.6%)      191.42      (6.9%)   19.8% (   6% -   34%) 0.000
             And2Terms2StopWords      161.65      (4.1%)      194.38      (7.1%)   20.2% (   8% -   32%) 0.000
                AndMedOrHighHigh       65.43      (2.8%)       79.68      (2.7%)   21.8% (  15% -   28%) 0.000
                    AndStopWords       29.75      (6.7%)       36.45     (12.7%)   22.5% (   2% -   44%) 0.000
                     OrStopWords       32.42      (9.6%)       39.95     (13.0%)   23.2% (   0% -   50%) 0.000
                       OrHighMed      192.11      (4.0%)      244.47      (2.7%)   27.3% (  19% -   35%) 0.000
                      OrHighHigh       51.75      (4.9%)       66.11      (9.2%)   27.8% (  13% -   44%) 0.000
                     AndHighHigh       43.14      (1.5%)       56.30     (11.8%)   30.5% (  16% -   44%) 0.000
                      AndHighMed      135.13      (0.9%)      183.67      (4.2%)   35.9% (  30% -   41%) 0.000

gf2121

I like it, simpler and faster!

gf2121 · 2025-05-22T13:32:59Z

lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java

@@ -55,9 +55,6 @@ final class BlockMaxConjunctionBulkScorer extends BulkScorer {
    this.iterators =
        Arrays.stream(this.scorers).map(Scorer::iterator).toArray(DocIdSetIterator[]::new);
    lead1 = ScorerUtil.likelyImpactsEnum(iterators[0]);
-    lead2 = ScorerUtil.likelyImpactsEnum(iterators[1]);
-    scorer1 = this.scorables[0];
-    scorer2 = this.scorables[1];
    this.sumOfOtherClauses = new double[this.scorers.length];
    for (int i = 0; i < sumOfOtherClauses.length; i++) {


Not introduced in this PR, but maybe change this to a Arrays.fill by the way :)

gf2121 · 2025-05-22T13:33:22Z

lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java

@@ -38,11 +37,12 @@ final class BlockMaxConjunctionBulkScorer extends BulkScorer {
  private final Scorer[] scorers;
  private final Scorable[] scorables;
  private final DocIdSetIterator[] iterators;
-  private final DocIdSetIterator lead1, lead2;
-  private final Scorable scorer1, scorer2;
+  private final DocIdSetIterator lead1;


Maybe just name it lead since we do not have lead2 now.

gf2121 · 2025-05-22T15:59:25Z

lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java


-      doc = lead1.nextDoc();
+    int maxOtherDoc = -1;
+    for (int i = 0; i < iterators.length; ++i) {


Not important but since we are naming maxOtherDoc let's start i from 1?

gf2121 · 2025-05-23T11:42:05Z

lucene/core/src/java/org/apache/lucene/search/MaxScoreBulkScorer.java

-      if (scorer.doc == doc) {
-        score += scorer.scorable.score();
-      }
+      ScorerUtil.filterCompetitiveHits(


Can we skip this if minCompetitiveScore == 0f, like other place?

This shouldn't be needed as this evaluates non-essential clauses, and there can only be non-essential clauses if minCompetitiveScore > 0. I'll add an assertion to make this clearer.

gf2121 · 2025-05-23T12:31:58Z

lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java

+      if (curDoc < targetDoc) {
+        curDoc = iterator.advance(targetDoc);
+      }
+      if (curDoc == targetDoc) {


Out of curiosity, could VectorUtil#findNextGTE help here?

I wondered the same. :) Let's keep it this way (slightly simpler) and look into whether this can be made faster with VectorUtil in a follow-up?

gf2121 · 2025-05-23T13:09:27Z

lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java

@@ -118,98 +115,36 @@ private void scoreWindow(
      return;
    }

-    Scorable scorer1 = this.scorer1;
-    if (scorers[0].getMaxScore(max - 1) == 0f) {


I was wondering why the FilteredAnd tasks, which use BlockMaxConjunctionBulkScorer, are getting more slow down than other tasks. Could the removal of this special optimization be the reason? Do we intend to remove this for simplification?

This may be a contributing factor indeed. I suspect that another factor is that filtered tasks use the default impl of nextDocsAndScores which adds some overhead compared to what we were doing before since it first needs to copy matching docs to an array before doing work on this array. I was keen to looking into the performance of filtered tasks in a follow-up, I worry it's not going to be easy to recover performance.

gf2121

Thank you!

…-time fashion. (#14701) `MaxScoreBulkScorer` and `BlockMaxConjunctionBulkScorer` currently evaluate hits in a doc-at-a-time (DAAT) fashion, meaning that they they look at all their clauses to find the next doc and so forth until all docs from the window are evaluated. This changes evaluation to run in a more term-at-a-time fashion (TAAT) within scoring windows, meaning that each clause is fully evaluated within the window before moving on to the next clause. Note that this isn't completely new, `BooleanScorer` has been doing this to exhaustively evaluate disjunctive queries, by loading their matches into a bit set, one clause at a time. Also note that this is a bit different from traditional TAAT as this is scoped to small-ish windows of doc IDs, not the entire doc ID space. This in-turn allows these scorers to take advantage of the new `Scorer#nextDocsAndScores` API, and provides a good speedup. A downside is that we may need to perform more memory copying in some cases, and evaluate a bit more documents, but the change still looks like a win in general.

* main: (32 commits) update os.makedirs with pathlib mkdir (apache#14710) Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674) Implement #docIDRunEnd() on PostingsEnum. (apache#14693) Speed up TermQuery (apache#14709) Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701) Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706) Use a temporary repository location to download certain ecj versions ("drops") (apache#14703) Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696) Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700) Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695) Speed up exhaustive evaluation. (apache#14679) Specify and test that IOContext is immutable (apache#14686) deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691) deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692) Clean up how the test framework creates asserting scorables. (apache#14452) Make competitive iterators more robust. (apache#14532) Remove DISIDocIdStream. (apache#14550) Implement AssertingPostingsEnum#intoBitSet. (apache#14675) Fix patience knn queries to work with seeded knn queries (apache#14688) Added toString() method to BytesRefBuilder (apache#14676) ...

jpountz · 2025-05-31T06:16:18Z

Nightly benchmarks saw a good speedup for some queries (OrHighMed, AndHighHigh, ...) but the slowdown on filtered queries was higher than what I was observing locally: https://benchmarks.mikemccandless.com/2025.05.29.13.41.40.html

jpountz · 2025-05-31T07:43:46Z

I'll spend some time better understanding what happens with these filtered tasks. It's not too surprising to me that there is some slowdown, but I wouldn't have expected this magnitude.

jpountz · 2025-05-31T13:53:59Z

I think I found the reason, the new approach to scoring conjunctions appears to be slower in the ramp-up phase, when less than k hits have been collected and faster in the steady state when most hits have a score that is less than the minimum competitive score. Filtered conjunctions are slower because they have barely more than k=100 hits globally. I'll open a PR shortly.

…icks in. This essentially reverts the change from apache#14701 for conjunctive queries that have not reached their `totalHitsThreshold` yet. This should speed up queries whose total number of matches is in the order of `totalHitsThreshold` or less, such as filtered conjunctions on nightly benchmarks.

…icks in. (#14739) This essentially reverts the change from #14701 for conjunctive queries that have not reached their `totalHitsThreshold` yet. This should speed up queries whose total number of matches is in the order of `totalHitsThreshold` or less, such as filtered conjunctions on nightly benchmarks.

jpountz added this to the 10.3.0 milestone May 22, 2025

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 22, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 22, 2025

github-actions bot added the module:core/search label May 22, 2025

changes

4b9177a

jpountz mentioned this pull request May 22, 2025

Speed up conjunctive queries that need scores. #14690

Closed

gf2121 reviewed May 23, 2025

View reviewed changes

jpountz added 2 commits May 23, 2025 15:44

Review feedback

f3977b9

Merge branch 'main' into clause_at_a_time

51f9a3e

gf2121 approved these changes May 23, 2025

View reviewed changes

jpountz merged commit b5e79a3 into apache:main May 25, 2025
7 checks passed

jpountz deleted the clause_at_a_time branch May 25, 2025 19:18

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking May 25, 2025

jpountz mentioned this pull request May 31, 2025

Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. #14739

Merged

Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. #14701

Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. #14701

Uh oh!

Conversation

jpountz commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

jpountz commented May 22, 2025

Uh oh!

gf2121 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gf2121 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpountz commented May 31, 2025

Uh oh!

jpountz commented May 31, 2025

Uh oh!

jpountz commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jpountz May 23, 2025 •

edited

Loading

jpountz commented May 31, 2025 •

edited

Loading