Speed up exhaustive evaluation. #14679

jpountz · 2025-05-16T11:27:56Z

This change helps speed up exhaustive evaluation of term queries, ie. calling DocIdSetIterator#nextDoc() then Scorer#score() in a loop.

It helps in two ways:

Iteration of matching doc IDs gets a bit more efficient, especially in the case when a block of postings is encoded as a bit set.
Computation of scores now gets (auto-)vectorized.

While this change doesn't help much when dynamic pruning kicks in, I'm hopeful that we can improve this in the future.

This change helps speed up exhaustive evaluation of term queries, ie. calling `DocIdSetIterator#nextDoc()` then `Scorer#score()` in a loop. It helps in two ways: - Iteration of matching doc IDs gets a bit more efficient, especially in the case when a block of postings is encoded as a bit set. - Computation of scores now gets (auto-)vectorized. While this change doesn't help much when dynamic pruning kicks in, I'm hopeful that we can improve this in the future.

github-actions · 2025-05-16T11:28:55Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

jpountz · 2025-05-16T11:35:46Z

Exhaustive evaluation (totalHitsThreshold=Integer.MAX_VALUE):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                FilteredOr3Terms       53.34      (2.1%)       52.58      (1.5%)   -1.4% (  -4% -    2%) 0.229
                       CountTerm     8599.61      (6.2%)     8508.01      (3.0%)   -1.1% (  -9% -    8%) 0.730
                  FilteredOrMany        5.75      (1.8%)        5.70      (2.2%)   -0.9% (  -4% -    3%) 0.484
                          IntNRQ       30.32     (11.9%)       30.09     (11.2%)   -0.8% ( -21% -   25%) 0.915
              FilteredOrHighHigh       48.15      (2.3%)       47.81      (2.7%)   -0.7% (  -5% -    4%) 0.644
               FilteredOrHighMed       56.92      (2.2%)       56.62      (2.2%)   -0.5% (  -4% -    4%) 0.710
                 FilteredPrefix3       29.16      (3.5%)       29.04      (2.8%)   -0.4% (  -6% -    5%) 0.829
                AndMedOrHighHigh       43.48      (3.3%)       43.30      (3.0%)   -0.4% (  -6% -    6%) 0.835
                 CountAndHighMed      301.20      (0.8%)      300.43      (1.3%)   -0.3% (  -2% -    1%) 0.717
                   TermMonthSort     3203.56      (2.0%)     3195.42      (2.3%)   -0.3% (  -4% -    4%) 0.852
                DismaxOrHighHigh        7.01      (3.5%)        6.99      (4.5%)   -0.2% (  -7% -    8%) 0.925
                  FilteredIntNRQ       98.99      (1.0%)       98.86      (1.7%)   -0.1% (  -2% -    2%) 0.886
          CountFilteredOrHighMed      148.68      (0.8%)      148.59      (0.7%)   -0.1% (  -1% -    1%) 0.893
                     CountOrMany       30.07      (1.3%)       30.06      (3.3%)   -0.1% (  -4% -    4%) 0.972
                          Phrase        3.07      (0.5%)        3.07      (0.7%)   -0.0% (  -1% -    1%) 0.950
             CountFilteredOrMany       27.64      (1.6%)       27.64      (1.4%)   -0.0% (  -2% -    3%) 0.996
             CountFilteredPhrase       25.98      (1.3%)       25.99      (1.0%)    0.1% (  -2% -    2%) 0.933
             FilteredOrStopWords       29.82      (2.0%)       29.84      (2.2%)    0.1% (  -4% -    4%) 0.962
               TermDayOfYearSort      295.71      (1.4%)      295.96      (1.0%)    0.1% (  -2% -    2%) 0.907
                CountAndHighHigh      363.25      (1.8%)      363.57      (1.6%)    0.1% (  -3% -    3%) 0.935
                  FilteredPhrase       23.36      (0.4%)       23.38      (1.3%)    0.1% (  -1% -    1%) 0.877
                    AndStopWords       16.41      (1.1%)       16.43      (1.4%)    0.1% (  -2% -    2%) 0.865
                   TermTitleSort       90.77      (6.6%)       90.91      (3.8%)    0.2% (  -9% -   11%) 0.965
                     CountPhrase        4.21      (0.5%)        4.22      (0.9%)    0.2% (  -1% -    1%) 0.635
         CountFilteredOrHighHigh      137.11      (0.7%)      137.42      (0.8%)    0.2% (  -1% -    1%) 0.617
                        Wildcard       18.10      (5.8%)       18.16      (5.1%)    0.3% (  -9% -   11%) 0.920
                       And3Terms      148.00      (2.1%)      148.56      (1.8%)    0.4% (  -3% -    4%) 0.760
                  CountOrHighMed      339.44      (1.8%)      340.76      (2.1%)    0.4% (  -3% -    4%) 0.756
     FilteredAnd2Terms2StopWords      195.69      (1.2%)      196.57      (0.6%)    0.5% (  -1% -    2%) 0.447
                     AndHighHigh       27.09      (1.1%)       27.23      (0.9%)    0.5% (  -1% -    2%) 0.438
                         Prefix3       14.72      (6.6%)       14.80      (6.0%)    0.5% ( -11% -   14%) 0.892
                    FilteredTerm      112.90      (1.3%)      113.52      (0.5%)    0.5% (  -1% -    2%) 0.379
                 CountOrHighHigh      347.22      (1.8%)      349.12      (2.0%)    0.5% (  -3% -    4%) 0.649
               FilteredAnd3Terms      222.15      (1.8%)      223.38      (0.6%)    0.6% (  -1% -    3%) 0.518
              FilteredAndHighMed      138.64      (1.3%)      139.44      (0.8%)    0.6% (  -1% -    2%) 0.410
             FilteredAndHighHigh       71.98      (2.0%)       72.41      (0.9%)    0.6% (  -2% -    3%) 0.550
                 DismaxOrHighMed       10.76      (4.2%)       10.84      (4.2%)    0.7% (  -7% -    9%) 0.787
            FilteredAndStopWords       49.82      (2.3%)       50.22      (1.6%)    0.8% (  -3% -    4%) 0.524
                IntervalsOrdered        2.26      (3.0%)        2.28      (2.6%)    0.8% (  -4% -    6%) 0.635
      FilteredOr2Terms2StopWords       27.40      (2.6%)       27.64      (1.5%)    0.9% (  -3% -    5%) 0.516
             And2Terms2StopWords       86.94      (1.4%)       87.73      (1.9%)    0.9% (  -2% -    4%) 0.382
                    CombinedTerm       31.88      (3.3%)       32.18      (3.1%)    0.9% (  -5% -    7%) 0.648
                      DismaxTerm       45.81      (5.8%)       46.24      (4.9%)    0.9% (  -9% -   12%) 0.783
                      AndHighMed       92.77      (1.4%)       93.65      (1.0%)    1.0% (  -1% -    3%) 0.227
                          Fuzzy2       85.92      (1.2%)       86.85      (1.2%)    1.1% (  -1% -    3%) 0.153
                            Term       74.11      (0.9%)       74.92      (1.5%)    1.1% (  -1% -    3%) 0.169
                        PKLookup      316.12      (5.0%)      320.82      (5.7%)    1.5% (  -8% -   12%) 0.660
              CombinedAndHighMed       40.66      (3.8%)       41.29      (1.9%)    1.5% (  -4% -    7%) 0.418
                 AndHighOrMedMed       33.86      (3.3%)       34.41      (2.1%)    1.6% (  -3% -    7%) 0.360
             CombinedAndHighHigh       12.60      (5.5%)       12.83      (2.1%)    1.9% (  -5% -   10%) 0.480
                      TermDTSort      402.82      (6.1%)      412.67      (1.0%)    2.4% (  -4% -   10%) 0.374
                          Fuzzy1       91.70      (1.3%)       95.61      (0.8%)    4.3% (   2% -    6%) 0.000
              CombinedOrHighHigh        5.93      (4.2%)        6.26      (5.3%)    5.5% (  -3% -   15%) 0.066
               CombinedOrHighMed        9.13      (4.3%)        9.65      (5.3%)    5.6% (  -3% -   15%) 0.064
                      OrHighRare       19.06      (2.4%)       21.38      (3.8%)   12.1% (   5% -   18%) 0.000
                       OrHighMed       19.01      (1.8%)       29.92      (0.9%)   57.4% (  53% -   61%) 0.000
                        Or3Terms       25.69      (1.5%)       40.45      (0.7%)   57.5% (  54% -   60%) 0.000
                      OrHighHigh       12.66      (1.5%)       20.58      (1.0%)   62.6% (  59% -   66%) 0.000
                     OrStopWords        6.11      (1.3%)       10.55      (1.3%)   72.7% (  69% -   76%) 0.000
              Or2Terms2StopWords        5.30      (1.9%)        9.28      (1.8%)   75.1% (  70% -   80%) 0.000
                          OrMany        1.63      (1.4%)        3.13      (1.6%)   91.8% (  87% -   96%) 0.000

When dynamic pruning is enabled (Lucene's defaults):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
            FilteredAndStopWords       45.33      (1.2%)       43.82      (3.6%)   -3.3% (  -8% -    1%) 0.034
             FilteredAndHighHigh       65.85      (1.5%)       64.06      (3.1%)   -2.7% (  -7% -    1%) 0.051
                    AndStopWords       29.85      (5.7%)       29.08      (6.2%)   -2.6% ( -13% -    9%) 0.457
                IntervalsOrdered        2.28      (2.3%)        2.23      (4.5%)   -2.2% (  -8% -    4%) 0.290
     FilteredAnd2Terms2StopWords      172.40      (0.9%)      168.82      (1.6%)   -2.1% (  -4% -    0%) 0.006
                      DismaxTerm      484.12      (1.7%)      474.54      (1.4%)   -2.0% (  -4% -    1%) 0.025
                      TermDTSort      405.39      (4.2%)      397.98      (4.5%)   -1.8% ( -10% -    7%) 0.470
                       And3Terms      173.19      (3.2%)      170.07      (3.7%)   -1.8% (  -8% -    5%) 0.368
              FilteredAndHighMed      125.60      (2.3%)      123.40      (2.4%)   -1.7% (  -6% -    2%) 0.190
               FilteredAnd3Terms      168.40      (2.2%)      165.67      (1.8%)   -1.6% (  -5% -    2%) 0.153
                     CountPhrase        4.16      (2.6%)        4.10      (3.7%)   -1.6% (  -7% -    4%) 0.401
                      AndHighMed      134.30      (0.6%)      132.22      (1.3%)   -1.5% (  -3% -    0%) 0.009
                AndMedOrHighHigh       65.58      (1.5%)       64.62      (2.0%)   -1.5% (  -4% -    2%) 0.147
                     AndHighHigh       42.50      (1.4%)       41.92      (2.1%)   -1.4% (  -4% -    2%) 0.183
                   TermTitleSort       90.81      (3.3%)       89.59      (3.1%)   -1.3% (  -7% -    5%) 0.470
             CountFilteredPhrase       25.42      (1.0%)       25.08      (1.4%)   -1.3% (  -3% -    1%) 0.061
                            Term      434.45      (3.0%)      429.49      (6.3%)   -1.1% ( -10% -    8%) 0.689
             And2Terms2StopWords      166.31      (3.2%)      164.47      (3.7%)   -1.1% (  -7% -    5%) 0.577
                        PKLookup      323.33      (3.9%)      320.16      (5.1%)   -1.0% (  -9% -    8%) 0.708
                       OrHighMed      185.68      (1.9%)      183.88      (3.8%)   -1.0% (  -6% -    4%) 0.576
                      OrHighHigh       49.69      (2.4%)       49.22      (4.1%)   -0.9% (  -7% -    5%) 0.628
                       CountTerm     8469.36      (4.1%)     8397.80      (3.8%)   -0.8% (  -8% -    7%) 0.710
              FilteredOrHighHigh       67.32      (2.2%)       66.76      (1.5%)   -0.8% (  -4% -    2%) 0.433
             FilteredOrStopWords       46.18      (2.8%)       45.81      (2.0%)   -0.8% (  -5% -    4%) 0.568
                  FilteredOrMany       16.62      (1.7%)       16.50      (2.9%)   -0.7% (  -5% -    3%) 0.593
               CombinedOrHighMed       73.42      (1.3%)       72.90      (2.0%)   -0.7% (  -4% -    2%) 0.472
               TermDayOfYearSort      285.02      (3.4%)      283.24      (4.0%)   -0.6% (  -7% -    7%) 0.772
                  FilteredPhrase       33.28      (1.8%)       33.08      (1.2%)   -0.6% (  -3% -    2%) 0.501
                        Or3Terms      164.02      (3.3%)      163.11      (4.9%)   -0.6% (  -8% -    7%) 0.817
                    CombinedTerm       30.17      (3.6%)       30.01      (3.0%)   -0.5% (  -6% -    6%) 0.783
                  CountOrHighMed      368.23      (2.0%)      366.30      (2.2%)   -0.5% (  -4% -    3%) 0.661
                     OrStopWords       31.61      (6.3%)       31.47      (8.0%)   -0.5% ( -13% -   14%) 0.912
               FilteredOrHighMed      152.49      (1.5%)      151.89      (0.8%)   -0.4% (  -2% -    1%) 0.562
              CombinedOrHighHigh       19.12      (1.9%)       19.05      (2.3%)   -0.4% (  -4% -    3%) 0.775
              CombinedAndHighMed       39.13      (3.9%)       39.01      (3.3%)   -0.3% (  -7% -    7%) 0.888
                FilteredOr3Terms      166.72      (1.8%)      166.36      (0.6%)   -0.2% (  -2% -    2%) 0.777
                    FilteredTerm      157.71      (2.9%)      157.40      (2.3%)   -0.2% (  -5% -    5%) 0.897
      FilteredOr2Terms2StopWords      148.31      (1.5%)      148.08      (1.0%)   -0.2% (  -2% -    2%) 0.837
                          Phrase       14.61      (2.9%)       14.64      (1.2%)    0.2% (  -3% -    4%) 0.881
              Or2Terms2StopWords      156.06      (3.7%)      156.47      (4.4%)    0.3% (  -7% -    8%) 0.911
             CountFilteredOrMany       27.40      (0.7%)       27.48      (1.6%)    0.3% (  -1% -    2%) 0.684
             CombinedAndHighHigh       11.51      (4.0%)       11.55      (3.8%)    0.4% (  -7% -    8%) 0.870
          CountFilteredOrHighMed      148.24      (0.5%)      148.89      (0.5%)    0.4% (   0% -    1%) 0.140
                 AndHighOrMedMed       46.45      (1.5%)       46.66      (2.4%)    0.5% (  -3% -    4%) 0.697
                 CountAndHighMed      309.81      (1.8%)      311.39      (1.7%)    0.5% (  -2% -    4%) 0.608
                          IntNRQ      304.10      (2.3%)      306.45      (0.8%)    0.8% (  -2% -    3%) 0.435
                 DismaxOrHighMed      166.23      (1.4%)      167.53      (2.6%)    0.8% (  -3% -    4%) 0.522
                  FilteredIntNRQ      300.74      (2.2%)      303.30      (0.5%)    0.8% (  -1% -    3%) 0.359
         CountFilteredOrHighHigh      136.40      (1.0%)      137.57      (0.5%)    0.9% (   0% -    2%) 0.056
                         Prefix3      164.17      (5.1%)      165.64      (2.5%)    0.9% (  -6% -    8%) 0.700
                 FilteredPrefix3      161.26      (5.4%)      162.79      (2.1%)    0.9% (  -6% -    8%) 0.686
                 CountOrHighHigh      346.50      (1.6%)      349.92      (0.9%)    1.0% (  -1% -    3%) 0.203
                          Fuzzy2       85.93      (1.4%)       86.79      (1.0%)    1.0% (  -1% -    3%) 0.156
                     CountOrMany       30.53      (0.9%)       30.87      (0.8%)    1.1% (   0% -    2%) 0.031
                CountAndHighHigh      358.55      (1.8%)      363.03      (1.7%)    1.2% (  -2% -    4%) 0.208
                        Wildcard       93.12      (3.3%)       94.52      (1.8%)    1.5% (  -3% -    6%) 0.333
                          Fuzzy1      101.52      (1.8%)      103.07      (1.5%)    1.5% (  -1% -    4%) 0.113
                DismaxOrHighHigh      112.45      (2.5%)      114.28      (3.5%)    1.6% (  -4% -    7%) 0.354
                   TermMonthSort     3132.96      (1.4%)     3185.18      (2.2%)    1.7% (  -1% -    5%) 0.114
                      OrHighRare      265.78      (3.5%)      274.41      (7.6%)    3.2% (  -7% -   14%) 0.341
                          OrMany       19.03      (3.3%)       20.56      (3.3%)    8.0% (   1% -   15%) 0.000

gf2121

The speed up is very exciting! I did a rough pass and left some minor suggestions/questions.

So this optimization can typically help cases like TOPN_COUNT which needs to evaluate all docs, especially for the indices with deleted docs which makes count can not return in constant time!

gf2121 · 2025-05-16T12:37:26Z

lucene/core/src/java/org/apache/lucene/search/DocAndScoreBuffer.java

+  /** Grow both arrays to ensure that they can store at least the given number of entries. */
+  public void grow(int minSize) {
+    if (docs.length < minSize) {
+      docs = ArrayUtil.grow(docs, minSize);


Maybe we typically need growNoCopy instead of grow?

gf2121 · 2025-05-16T12:41:18Z

lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java

+   *
+   * <p><b>NOTE</b>: The returned {@link DocAndFreqBuffer} should not hold references to internal
+   * data structures.
+   *


Clarify we should not call this when unpositioned?

gf2121 · 2025-05-16T12:52:28Z

lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java

+            size2 = enumerateSetBits(docBitSet.getBits()[i], i << 6, reuse.docs, size2);
+          }
+          assert size2 >= size : size2 + " < " + size;
+          for (int i = 0; i < size; ++i) {


Though this loop might get vectorized, would it be faster if just add to the base of enumerateSetBits? because these words typically get dense 1 bits.

enumerateSetBits(docBitSet.getBits()[i], (i << 6) + docBitSetBase, reuse.docs, size2)

gf2121 · 2025-05-16T12:58:51Z

lucene/core/src/java/org/apache/lucene/search/DocAndFreqBuffer.java

+  /** Grow both arrays to ensure that they can store at least the given number of entries. */
+  public void grow(int minSize) {
+    if (docs.length < minSize) {
+      docs = ArrayUtil.grow(docs, minSize);


Same here, growNoCopy might be better?

gf2121 · 2025-05-16T13:08:24Z

lucene/core/src/java/org/apache/lucene/search/Scorer.java

+   * <p><b>NOTE</b>: The returned {@link DocAndScoreBuffer} should not hold references to internal
+   * data structures.
+   *
+   * <p><b>NOTE</b>: In case this {@link Scorer} exposes a {@link #twoPhaseIterator()


When only disi exposed, it should be positioned as well?

Yes indeed, will clarify.

gf2121 · 2025-05-16T13:12:43Z

lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java

+   * return reuse;
+   * </pre>
+   *
+   * <p><b>NOTE</b>: The returned {@link DocAndFreqBuffer} should not hold references to internal


Can we make the buffer arrays private and only expose getters and grow?

There are a couple places where I call System#arraycopy directly on these arrays, let me think more about it.

I ended up not applying this suggestion, or the API calls would have looked awkward. I hope this is ok.

Thanks for trying! Let's keep it then.

gf2121 · 2025-05-16T13:15:22Z

lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java

+   * <p>This method behaves as if implemented as below, which is the default implementation:
+   *
+   * <pre class="prettyprint">
+   * int batchSize = 16;


When i only read this java doc without looking impls, i was thinking impls should limit their block size under 16 as well :) Maybe clarify the max size of buffer depends on data structures.

gf2121 · 2025-05-16T13:20:54Z

lucene/core/src/java/org/apache/lucene/search/TermScorer.java

+    }
+
+    int size = docAndFreqBuffer.size;
+    normValues = ArrayUtil.grow(normValues, size);


growNoCopy?

gf2121 · 2025-05-16T13:25:57Z

lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java

+     */
+    public void score(int size, int[] freqs, long[] norms, float[] scores) {
+      for (int i = 0; i < size; ++i) {
+        scores[i] = score(freqs[i], norms[i]);


Computation of scores now gets (auto-)vectorized.

By this word, do you mean this method can get vectorized? So the abstraction layer do not prevent inline here?

Auto-vectorization requires score(float, long) to get inlined indeed, which would only happen if there are two impls of SimScorer being used at most. We may need to implement score(int, int[], long[], float[]) on our main similarities in the future to make performance more predictable. We may also be able to do a bit better than calling score in a loop. I was trying to keep the change small.

We may also be able to do a bit better than calling score in a loop

Yeah! I played withBM25 a bit and the result looks promising:

Benchmark Mode Cnt Score Error Units VectorizedBM25Benchmark.scoreBaseline thrpt 5 10.991 ± 0.356 ops/us VectorizedBM25Benchmark.scoreVector thrpt 5 15.149 ± 0.029 ops/us

public static void scoreBaseline(int size, int[] freqs, long[] norms, float[] scores, float[] cache, int weight, float[] buffer) { for (int i = 0; i < size; ++i) { float normInverse = cache[((byte) norms[i]) & 0xFF]; scores[i] = weight - weight / (1f + freqs[i] * normInverse); } } public static void scoreVector(int size, int[] freqs, long[] norms, float[] scores, float[] cache, int weight, float[] buffer) { for (int i = 0; i < size; ++i) { buffer[i] = cache[((byte) norms[i]) & 0xFF]; } for (int i = 0; i < size; ++i) { scores[i] = weight - weight / (1f + freqs[i] * buffer[i]); } }

jpountz · 2025-05-16T14:06:48Z

So this optimization can typically help cases like TOPN_COUNT which needs to evaluate all docs, especially for the indices with deleted docs which makes count can not return in constant time!

Right, though we don't use Weight#count for TOPN_COUNT, we probably should!

It should also help sparse neural search, when weights are less predictable and dynamic pruning works less well and effectively evaluates hits exhaustively in practice.

Finally, I'm hoping that we can iterate on this change to also speed up top-n evaluation in the future.

rmuir · 2025-05-20T12:19:16Z

Do we really need the method on Similarity? I guess I feel, most users are probably using BM25Similarity, so I don't understand the explanation in the comments.

If we have "bogus" instances (such as wrappers) of similarity in use, then that's a java problem, let's fix that instead.

jpountz · 2025-05-20T20:05:54Z

You are correct, no need for additional APIs on Similarity at this point, I removed it. I suspect it may be tempting in the future, because it enables further optimizations as @gf2121 showed in #14679 (comment) (though let's see if it actually translates to speedups with luceneutil), and because FeatureField is a contributor to SimScorer#score polymorphism. We can discuss this more in a followup.

I cleaned up the change, it's now ready for review.

Calls to `DocIdSetIterator#nextDoc`, `DocIdSetIterator#advance` and `SimScorer#score` are currently interleaved and include lots of conditionals. This builds up on apache#14679 and refactors the code a bit to make it eligible to auto-vectorization and better pipelining. This effectively speeds up conjunctive queries (e.g. `AndHighHigh`) but also disjunctive queries that run as conjunctive queries in practice (e.g. `OrHighHigh`).

rmuir · 2025-05-21T02:47:59Z

Thank you! "bulkpostings 2.0" is looking really clean and non-invasive :)

I suspect it may be tempting in the future, because it enables further optimizations as @gf2121 showed in #14679 (comment) (though let's see if it actually translates to speedups with luceneutil), and because FeatureField is a contributor to SimScorer#score polymorphism. We can discuss this more in a followup.

Yes, thank you, I agree 100% to investigate it as followup: the additional speedup hinted at there seems promising. If we can proceed with caution there, it would help.

For similarities in particular, correct formula can be difficult, and if you have to implement it twice, I have some concerns around correctness. At the very minimum we'd want to improve BaseSimilarityTestCase...

For PostingsEnum/Scorer changes I have similar concerns about correctness, I think what's happening in Asserting is not enough to guarantee correctness? E.g. for the PostingsEnum one I would think about CheckIndex itself validating the new bulk API, BasePostingsFormatTestCase additions, and also TestDuelingCodecs.

gf2121

Fantastic job!

gf2121 · 2025-05-21T06:41:51Z

lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java

+      freq();
+
+      int start = docBufferUpto - 1;
+      buffer.size = 0;


Nit: buffer.size has be set to 0 above (line 1047), can we avoid this one?

gf2121 · 2025-05-21T07:04:00Z

lucene/core/src/java/org/apache/lucene/search/Scorer.java

+    int batchSize = 16; // arbitrary
+    buffer.growNoCopy(batchSize);
+    int size = 0;
+    DocIdSetIterator iterator = iterator();


We have many implementations returning a new iterator here (like TwoPhaseIterator.asDocIdSetIterator), will the object construction for each 16 docs cause noticeable overhead?

Possibly indeed. Let's look into it as a follow-up? I'm not sure if we should cache the iterator here or rather fix impls to avoid allocating in #iterator().

Let's look into it as a follow-up

+1

gf2121 · 2025-05-21T07:10:15Z

lucene/core/src/java/org/apache/lucene/search/TermScorer.java

+    int size = docAndFreqBuffer.size;
+    normValues = ArrayUtil.growNoCopy(normValues, size);
+    if (norms == null) {
+      Arrays.fill(normValues, 0, size, 1L);


Can we only do this fill when grow happens?

jpountz · 2025-05-21T08:53:54Z

Thanks for the feedback, both. I added coverage to BasePostingsFormatTestCase. TestDuelingCodecs is a bit tricky since implementations are free to return buffers of arbitrary sizes. I will look into CheckIndex next, looking into how not to slow it down too much.

gf2121 · 2025-05-21T09:04:38Z

lucene/core/src/java/org/apache/lucene/search/TermScorer.java

+        Arrays.fill(normValues, 1L);
+      }
+    }
+    normValues = ArrayUtil.growNoCopy(normValues, size);


This line can be removed?

jpountz · 2025-05-21T12:33:29Z

CheckIndex integration is pushed, I hooked into a place where we were already exhaustively consuming the PostingsEnum anyway, so it shouldn't cause a major slowdown.

This change helps speed up exhaustive evaluation of term queries, ie. calling `DocIdSetIterator#nextDoc()` then `Scorer#score()` in a loop. It helps in two ways: - Iteration of matching doc IDs gets a bit more efficient, especially in the case when a block of postings is encoded as a bit set. - Computation of scores now gets (auto-)vectorized. While this change doesn't help much when dynamic pruning kicks in, I'm hopeful that we can improve this in the future.

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

jpountz · 2025-05-23T14:08:10Z

Nightly benchmarks reported a ~6% speedup on the OrMany task: https://benchmarks.mikemccandless.com/OrMany.html. I'll push an annotation.

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

* main: (32 commits) update os.makedirs with pathlib mkdir (apache#14710) Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674) Implement #docIDRunEnd() on PostingsEnum. (apache#14693) Speed up TermQuery (apache#14709) Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701) Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706) Use a temporary repository location to download certain ecj versions ("drops") (apache#14703) Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696) Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700) Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695) Speed up exhaustive evaluation. (apache#14679) Specify and test that IOContext is immutable (apache#14686) deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691) deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692) Clean up how the test framework creates asserting scorables. (apache#14452) Make competitive iterators more robust. (apache#14532) Remove DISIDocIdStream. (apache#14550) Implement AssertingPostingsEnum#intoBitSet. (apache#14675) Fix patience knn queries to work with seeded knn queries (apache#14688) Added toString() method to BytesRefBuilder (apache#14676) ...

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at #14679 (comment).

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at #14679 (comment).

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 16, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 16, 2025

github-actions bot added module:core/index module:core/search module:core/codecs module:test-framework labels May 16, 2025

gf2121 reviewed May 16, 2025

View reviewed changes

jpountz added 4 commits May 20, 2025 21:54

Simplify.

5b2bb4d

CHANGES

aa24297

Fix name.

fd97a02

Improve docs.

3833804

jpountz marked this pull request as ready for review May 20, 2025 20:05

jpountz added this to the 10.3.0 milestone May 20, 2025

jpountz mentioned this pull request May 20, 2025

Speed up conjunctive queries that need scores. #14690

Closed

jpountz added 4 commits May 20, 2025 23:23

Merge branch 'main' into vectorized_exhaustive_evaluation

6b4f163

Undo unintended change

52d10b2

growNoCopy

f3a4ed2

tidy

3141411

gf2121 reviewed May 21, 2025

View reviewed changes

jpountz added 2 commits May 21, 2025 10:48

Review feedback.

f0c2cf4

Moar tests

fb83570

gf2121 approved these changes May 21, 2025

View reviewed changes

jpountz added 3 commits May 21, 2025 14:31

Review feedback

c241d43

CheckIndex integration

a399f5e

More tests

71f4765

jpountz merged commit 3dad852 into apache:main May 22, 2025
7 checks passed

jpountz deleted the vectorized_exhaustive_evaluation branch May 22, 2025 09:49

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking May 22, 2025

jpountz mentioned this pull request May 23, 2025

Better vectorize score computations. #14704

Merged

Speed up exhaustive evaluation. #14679

Speed up exhaustive evaluation. #14679

Uh oh!

Conversation

jpountz commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025

Uh oh!

jpountz commented May 16, 2025

Uh oh!

gf2121 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gf2121 May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz commented May 16, 2025

Uh oh!

rmuir commented May 20, 2025

Uh oh!

jpountz commented May 20, 2025

Uh oh!

rmuir commented May 21, 2025

Uh oh!

gf2121 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz commented May 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz commented May 21, 2025

Uh oh!

Uh oh!

jpountz commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gf2121 May 16, 2025 •

edited

Loading

jpountz commented May 23, 2025 •

edited

Loading