Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset #14304

thecoop · 2025-02-27T11:13:04Z

This resolves #13922. It takes the existing methods in ScalarQuantizer, and creates vectorized versions of that same algorithm.

JMH shows a ~13x speedup:

Benchmark              Mode  Cnt     Score    Error   Units
Quant.quantize        thrpt    5   235.029 ±  3.204  ops/ms
Quant.quantizeVector  thrpt    5  3153.388 ± 192.635  ops/ms

…eCorrectiveOffset

…ization

jpountz · 2025-02-27T21:09:20Z

Have you been able to run luceneutil to get a sense of the indexing and search speedups?

thecoop · 2025-02-28T09:15:15Z

Unfortunately not, I've been unable to get the quantized vector datasets working on my machine

benwtrent · 2025-03-04T15:10:21Z

I compared this branch with main. There are measurable improvements, but the quantization step isn't the main bottle neck. Vector comparisons still dominate the costs. But, its a nice bump I would say.

candidate:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.826         2.340  500000   100      50       32        100     7 bits    86.54       5777.61         337.47             1          1859.34       1831.055       366.211

baseline:

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.828         2.680  500000   100      50       32        100     7 bits    88.48       5650.74         357.45             1          1859.57       1831.055       366.211

…ization

jpountz · 2025-03-07T06:20:31Z

12.5% faster search overall if I read correctly? This is pretty cool! We've been excited about smaller speedups many times in Lucene's history. :)

jpountz · 2025-03-07T06:32:29Z

Hmm maybe I got confused, as quantization only needs to be applied to the query vector at query time, so the search speedup is noise and I should rather be looking at the indexing speedup (+2%) and merging speedup (+5%)?

benwtrent · 2025-03-07T12:29:52Z

as quantization only needs to be applied to the query vector at query time, so the search speedup is noise and I should rather be looking at the indexing speedup (+2%) and merging speedup (+5%)?

That is what I think. We can run a bunch more times, but I do think this provides a marginal improvement at indexing time, where we may actually re-quantize all the vectors.

I bet for "flat" indices, that only utilize the quantization, the speed up is significant. Though I haven't had time to benchmark that yet.

benwtrent · 2025-03-07T21:22:54Z

lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java

+            v.sub(minQuantile / 2f)
+                .mul(minQuantile)
+                .add(v.sub(minQuantile).sub(dxq).mul(dxq))
+                .reduceLanes(VectorOperators.ADD);


Could you collect the corrections in a float array? This way we keep all lanes parallized and then sum the floats later?

I think if you could keep the lanes separate for as long as possible, we get a bigger perf boost. Reducing lanes is a serious bottleneck.

Indeed it is - this doubles the performance

Benchmark Mode Cnt Score Error Units Quant.quantize thrpt 5 235.029 ± 3.204 ops/ms Quant.quantizeVector thrpt 5 2831.313 ± 46.475 ops/ms

And even more with FMA operations

Oh yes ;). Thats the numbers I am expecting.

benwtrent · 2025-03-07T21:23:23Z

lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java

@@ -907,4 +907,87 @@ public static long int4BitDotProduct128(byte[] q, byte[] d) {
    }
    return subRet0 + (subRet1 << 1) + (subRet2 << 2) + (subRet3 << 3);
  }
+
+  @Override
+  public float quantize(


Let's name this something better, we can call it "minMaxScalarQuantization" or something?

Done - and the recalculate method too

benwtrent · 2025-03-07T21:54:38Z

Ugh, my benchmark was on my laptop, which I think counts as "not having nice byte vectors". I will attempt to benchmark correctly on a cloud machine soon-ish.

Sorry @jpountz @thecoop for the incorrect benchmark numbers :)

benwtrent · 2025-03-11T10:25:27Z

On GCP, there isn't much difference. I wouldn't expect there to be a huge amount of difference as the dominate cost is the vector comparisons not the quantization.

I haven't tested with "flat" yet.

BASELINE

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.961         2.910  200000   100      50       64        250     7 bits     6677   111.44       1794.70          79.03             1           997.58        976.563       195.313

CANDIDATE

recall  latency (ms)    nDoc  topK  fanout  maxConn  beamWidth  quantized  visited  index s  index docs/s  force merge s  num segments  index size (MB)  vec disk (MB)  vec RAM (MB)
 0.960         2.460  200000   100      50       64        250     7 bits     6527   110.99       1801.98          76.68             1           997.55        976.563       195.313

benwtrent

OK, sorry for the random and sparse feedback. I think this is almost there. Then I can take over merging and backporting.

Please add a CHANGES entry for 10.2 optimizations indicating a minor speed improvement for scalar quantized query & indexing speed.

lucene/core/src/java/org/apache/lucene/util/VectorUtil.java

lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java

lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java

…ization

…eCorrectiveOffset (#14304) This resolves #13922. It takes the existing methods in `ScalarQuantizer`, and creates vectorized versions of that same algorithm. JMH shows a ~13x speedup: ``` Benchmark Mode Cnt Score Error Units Quant.quantize thrpt 5 235.029 ± 3.204 ops/ms Quant.quantizeVector thrpt 5 3153.388 ± 192.635 ops/ms ```

Create vectorized versions of ScalarQuantizer.quantize and recalculat…

ace9c1e

…eCorrectiveOffset

github-actions bot added the module:core/other label Feb 27, 2025

thecoop added 3 commits February 27, 2025 11:20

Merge remote-tracking branch 'upstream/main' into vector-scalar-quant…

b0dd541

…ization

Add a test

8edf1c2

Delta needs to be greater

9e1ed0f

thecoop marked this pull request as ready for review February 27, 2025 15:29

thecoop added 4 commits March 4, 2025 17:09

Merge branch 'main' into vector-scalar-quantization

50d6724

Missing overrides

fb69472

Scale the delta with the vector size

d580aa9

Merge remote-tracking branch 'upstream/main' into vector-scalar-quant…

c5c3a68

…ization

benwtrent reviewed Mar 7, 2025

View reviewed changes

thecoop added 2 commits March 10, 2025 09:43

Merge branch 'main' into vector-scalar-quantization

78ac8fa

Keep the sum vectorized as long as possible

cac99d3

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Mar 10, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Mar 10, 2025

thecoop added 2 commits March 10, 2025 12:29

Use FMA

0556aa8

Use FMA operations where possible

054485e

benwtrent reviewed Mar 25, 2025

View reviewed changes

thecoop added 2 commits March 25, 2025 15:07

PR comments

b511ab4

Merge remote-tracking branch 'upstream/main' into vector-scalar-quant…

7049128

…ization

benwtrent approved these changes Mar 25, 2025

View reviewed changes

benwtrent merged commit 1e8a146 into apache:main Mar 25, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Mar 25, 2025

Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset #14304

Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset #14304

Uh oh!

Conversation

thecoop commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Feb 27, 2025

Uh oh!

thecoop commented Feb 28, 2025

Uh oh!

benwtrent commented Mar 4, 2025

Uh oh!

jpountz commented Mar 7, 2025

Uh oh!

jpountz commented Mar 7, 2025

Uh oh!

benwtrent commented Mar 7, 2025

Uh oh!

benwtrent Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

thecoop Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thecoop Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

thecoop Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent commented Mar 7, 2025

Uh oh!

benwtrent commented Mar 11, 2025

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thecoop commented Feb 27, 2025 •

edited

Loading

thecoop Mar 10, 2025 •

edited

Loading