Fixing quantization interval initialization for optimized sq #14374

benwtrent · 2025-03-19T17:01:29Z

A rather silly bug that we didn't catch due to us testing on well-behaved modern vectors. However, many benchmarking cases and some more "bespoke" feature models (minst, etc.) do not have well distributed components. Consequently, the bug showed up.

Previously, the recall for minst was ~0.018. YIKES.

Here are some numbers with this bug fix (I include "well behaved" component vectors here to indicate there isn't a negative impact there).

The latency, etc. is always tricky to benchmark. These were ran on my laptop while I was actively working on other things. I would pay most attention to the recall.

Fashion-minst (784 dims)

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.443        0.188  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       1.000       181.646        2.203       HNSW
 0.629        0.274  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       2.000       181.646        2.203       HNSW
 0.730        0.349  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       3.000       181.646        2.203       HNSW
 0.792        0.471  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       4.000       181.646        2.203       HNSW
 0.833        0.479  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       5.000       181.646        2.203       HNSW
 0.926        0.786  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55      10.000       181.646        2.203       HNSW

recall  latency(ms)   nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType

 0.444        4.110  60000    10      50     1 bits      0.00      Infinity             1          186.43       1.000       181.646        2.203       FLAT
 0.629        4.383  60000    10      50     1 bits      0.00      Infinity             1          186.43       2.000       181.646        2.203       FLAT
 0.730        4.437  60000    10      50     1 bits      0.00      Infinity             1          186.43       3.000       181.646        2.203       FLAT
 0.792        4.455  60000    10      50     1 bits      0.00      Infinity             1          186.43       4.000       181.646        2.203       FLAT
 0.833        4.445  60000    10      50     1 bits      0.00      Infinity             1          186.43       5.000       181.646        2.203       FLAT
 0.926        4.607  60000    10      50     1 bits      0.00      Infinity             1          186.43      10.000       181.646        2.203       FLAT

COHERE v2 (768 dim):

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.600        0.516  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       1.000      2965.927       36.240       HNSW
 0.790        0.872  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       2.000      2965.927       36.240       HNSW
 0.870        1.255  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       3.000      2965.927       36.240       HNSW
 0.911        1.661  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       4.000      2965.927       36.240       HNSW
 0.935        1.971  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       5.000      2965.927       36.240       HNSW
 0.979        3.518  1000000    10      50     1 bits      0.00      Infinity             1         3137.81      10.000      2965.927       36.240       HNSW
recall  latency(ms)     nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType 
 0.611       34.891  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       1.000      2965.927       36.240       FLAT
 0.799       35.253  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       2.000      2965.927       36.240       FLAT
 0.876       35.454  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       3.000      2965.927       36.240       FLAT
 0.914       34.635  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       4.000      2965.927       36.240       FLAT
 0.939       35.473  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       5.000      2965.927       36.240       FLAT
 0.981       36.105  1000000    10      50     1 bits      0.00      Infinity             1         3038.48      10.000      2965.927       36.240       FLAT

Cohere V3 (1024 dim):

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.617        0.621  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       1.000      3950.119       43.869       HNSW
 0.822        1.245  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       2.000      3950.119       43.869       HNSW
 0.898        1.546  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       3.000      3950.119       43.869       HNSW
 0.934        2.006  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       4.000      3950.119       43.869       HNSW
 0.957        2.483  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       5.000      3950.119       43.869       HNSW
 0.991        4.489  1000000    10      50     1 bits      0.00      Infinity             1         4184.21      10.000      3950.119       43.869       HNSW

recall  latency(ms)     nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType 
 0.621       39.390  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       1.000      3950.119       43.869       FLAT
 0.822       39.182  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       2.000      3950.119       43.869       FLAT
 0.899       39.126  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       3.000      3950.119       43.869       FLAT
 0.936       39.304  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       4.000      3950.119       43.869       FLAT
 0.958       39.598  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       5.000      3950.119       43.869       FLAT
 0.991       39.523  1000000    10      50     1 bits      0.00      Infinity             1         4045.56      10.000      3950.119       43.869       FLAT

E5-small-v2 (384 dim):

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType

 0.785        0.340  500000    10      50     1 bits      0.00      Infinity             1          810.65       1.000       744.820       12.398       HNSW
 0.948        0.621  500000    10      50     1 bits      0.00      Infinity             1          810.65       2.000       744.820       12.398       HNSW
 0.976        0.871  500000    10      50     1 bits      0.00      Infinity             1          810.65       3.000       744.820       12.398       HNSW
 0.985        1.251  500000    10      50     1 bits      0.00      Infinity             1          810.65       4.000       744.820       12.398       HNSW
 0.990        1.507  500000    10      50     1 bits      0.00      Infinity             1          810.65       5.000       744.820       12.398       HNSW
 0.997        2.831  500000    10      50     1 bits      0.00      Infinity             1          810.65      10.000       744.820       12.398       HNSW

recall  latency(ms)    nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.786       14.569  500000    10      50     1 bits      0.00      Infinity             1          763.93       1.000       744.820       12.398       FLAT
 0.949       14.642  500000    10      50     1 bits      0.00      Infinity             1          763.93       2.000       744.820       12.398       FLAT
 0.977       14.657  500000    10      50     1 bits      0.00      Infinity             1          763.93       3.000       744.820       12.398       FLAT
 0.986       14.117  500000    10      50     1 bits      0.00      Infinity             1          763.93       4.000       744.820       12.398       FLAT
 0.990       14.298  500000    10      50     1 bits      0.00      Infinity             1          763.93       5.000       744.820       12.398       FLAT
 0.998       14.022  500000    10      50     1 bits      0.00      Infinity             1          763.93      10.000       744.820       12.398       FLAT

related: #14342

(I am not closing the issue with this PR, I think there is further improvements to be gained by preserving dot-product behavior on these variously distributed vector components).

john-wagster

lgtm; nice little bump on the scores for the datasets is nice

* Fixing quantization interval initialization for optimized sq * adding changes along with original binary quantization change * adjusting test

…14374) * Fixing quantization interval initialization for optimized sq * adding changes along with original binary quantization change * adjusting test

Fixing quantization interval initialization for optimized sq

e80c6fb

benwtrent added this to the 10.2.0 milestone Mar 19, 2025

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Mar 19, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Mar 19, 2025

github-actions bot added the module:core/other label Mar 19, 2025

adding changes along with original binary quantization change

25656c6

john-wagster approved these changes Mar 19, 2025

View reviewed changes

adjusting test

5fd0a70

benwtrent merged commit ab1de59 into apache:main Mar 19, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Mar 19, 2025

benwtrent deleted the osq-bugfix branch March 19, 2025 20:03

benwtrent added a commit that referenced this pull request Mar 19, 2025

Fixing quantization interval initialization for optimized sq (#14374)

061292b

* Fixing quantization interval initialization for optimized sq * adding changes along with original binary quantization change * adjusting test

john-wagster mentioned this pull request Apr 14, 2025

Fix bbq quantization algorithm but for differently distributed components elastic/elasticsearch#126778

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixing quantization interval initialization for optimized sq #14374

Fixing quantization interval initialization for optimized sq #14374

Uh oh!

benwtrent commented Mar 19, 2025

Uh oh!

john-wagster left a comment

Uh oh!

Uh oh!

Uh oh!

Fixing quantization interval initialization for optimized sq #14374

Fixing quantization interval initialization for optimized sq #14374

Uh oh!

Conversation

benwtrent commented Mar 19, 2025

Uh oh!

john-wagster left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!