Skip to content

Fixing quantization interval initialization for optimized sq #14374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 19, 2025

Conversation

benwtrent
Copy link
Member

A rather silly bug that we didn't catch due to us testing on well-behaved modern vectors. However, many benchmarking cases and some more "bespoke" feature models (minst, etc.) do not have well distributed components. Consequently, the bug showed up.

Previously, the recall for minst was ~0.018. YIKES.

Here are some numbers with this bug fix (I include "well behaved" component vectors here to indicate there isn't a negative impact there).

The latency, etc. is always tricky to benchmark. These were ran on my laptop while I was actively working on other things. I would pay most attention to the recall.

Fashion-minst (784 dims)

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.443        0.188  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       1.000       181.646        2.203       HNSW
 0.629        0.274  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       2.000       181.646        2.203       HNSW
 0.730        0.349  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       3.000       181.646        2.203       HNSW
 0.792        0.471  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       4.000       181.646        2.203       HNSW
 0.833        0.479  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55       5.000       181.646        2.203       HNSW
 0.926        0.786  60000    10      50       64        250     1 bits      0.00      Infinity             1          189.55      10.000       181.646        2.203       HNSW

recall  latency(ms)   nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType

 0.444        4.110  60000    10      50     1 bits      0.00      Infinity             1          186.43       1.000       181.646        2.203       FLAT
 0.629        4.383  60000    10      50     1 bits      0.00      Infinity             1          186.43       2.000       181.646        2.203       FLAT
 0.730        4.437  60000    10      50     1 bits      0.00      Infinity             1          186.43       3.000       181.646        2.203       FLAT
 0.792        4.455  60000    10      50     1 bits      0.00      Infinity             1          186.43       4.000       181.646        2.203       FLAT
 0.833        4.445  60000    10      50     1 bits      0.00      Infinity             1          186.43       5.000       181.646        2.203       FLAT
 0.926        4.607  60000    10      50     1 bits      0.00      Infinity             1          186.43      10.000       181.646        2.203       FLAT

COHERE v2 (768 dim):

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.600        0.516  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       1.000      2965.927       36.240       HNSW
 0.790        0.872  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       2.000      2965.927       36.240       HNSW
 0.870        1.255  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       3.000      2965.927       36.240       HNSW
 0.911        1.661  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       4.000      2965.927       36.240       HNSW
 0.935        1.971  1000000    10      50     1 bits      0.00      Infinity             1         3137.81       5.000      2965.927       36.240       HNSW
 0.979        3.518  1000000    10      50     1 bits      0.00      Infinity             1         3137.81      10.000      2965.927       36.240       HNSW
recall  latency(ms)     nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType 
 0.611       34.891  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       1.000      2965.927       36.240       FLAT
 0.799       35.253  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       2.000      2965.927       36.240       FLAT
 0.876       35.454  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       3.000      2965.927       36.240       FLAT
 0.914       34.635  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       4.000      2965.927       36.240       FLAT
 0.939       35.473  1000000    10      50     1 bits      0.00      Infinity             1         3038.48       5.000      2965.927       36.240       FLAT
 0.981       36.105  1000000    10      50     1 bits      0.00      Infinity             1         3038.48      10.000      2965.927       36.240       FLAT

Cohere V3 (1024 dim):

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.617        0.621  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       1.000      3950.119       43.869       HNSW
 0.822        1.245  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       2.000      3950.119       43.869       HNSW
 0.898        1.546  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       3.000      3950.119       43.869       HNSW
 0.934        2.006  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       4.000      3950.119       43.869       HNSW
 0.957        2.483  1000000    10      50     1 bits      0.00      Infinity             1         4184.21       5.000      3950.119       43.869       HNSW
 0.991        4.489  1000000    10      50     1 bits      0.00      Infinity             1         4184.21      10.000      3950.119       43.869       HNSW

recall  latency(ms)     nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType 
 0.621       39.390  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       1.000      3950.119       43.869       FLAT
 0.822       39.182  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       2.000      3950.119       43.869       FLAT
 0.899       39.126  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       3.000      3950.119       43.869       FLAT
 0.936       39.304  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       4.000      3950.119       43.869       FLAT
 0.958       39.598  1000000    10      50     1 bits      0.00      Infinity             1         4045.56       5.000      3950.119       43.869       FLAT
 0.991       39.523  1000000    10      50     1 bits      0.00      Infinity             1         4045.56      10.000      3950.119       43.869       FLAT

E5-small-v2 (384 dim):

recall  latency(ms)   nDoc  topK  fanout  maxConn  beamWidth quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType

 0.785        0.340  500000    10      50     1 bits      0.00      Infinity             1          810.65       1.000       744.820       12.398       HNSW
 0.948        0.621  500000    10      50     1 bits      0.00      Infinity             1          810.65       2.000       744.820       12.398       HNSW
 0.976        0.871  500000    10      50     1 bits      0.00      Infinity             1          810.65       3.000       744.820       12.398       HNSW
 0.985        1.251  500000    10      50     1 bits      0.00      Infinity             1          810.65       4.000       744.820       12.398       HNSW
 0.990        1.507  500000    10      50     1 bits      0.00      Infinity             1          810.65       5.000       744.820       12.398       HNSW
 0.997        2.831  500000    10      50     1 bits      0.00      Infinity             1          810.65      10.000       744.820       12.398       HNSW

recall  latency(ms)    nDoc  topK  fanout  quantized  index(s)  index_docs/s  num_segments  index_size(MB)  overSample  vec_disk(MB)  vec_RAM(MB)  indexType
 0.786       14.569  500000    10      50     1 bits      0.00      Infinity             1          763.93       1.000       744.820       12.398       FLAT
 0.949       14.642  500000    10      50     1 bits      0.00      Infinity             1          763.93       2.000       744.820       12.398       FLAT
 0.977       14.657  500000    10      50     1 bits      0.00      Infinity             1          763.93       3.000       744.820       12.398       FLAT
 0.986       14.117  500000    10      50     1 bits      0.00      Infinity             1          763.93       4.000       744.820       12.398       FLAT
 0.990       14.298  500000    10      50     1 bits      0.00      Infinity             1          763.93       5.000       744.820       12.398       FLAT
 0.998       14.022  500000    10      50     1 bits      0.00      Infinity             1          763.93      10.000       744.820       12.398       FLAT

related: #14342

(I am not closing the issue with this PR, I think there is further improvements to be gained by preserving dot-product behavior on these variously distributed vector components).

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; nice little bump on the scores for the datasets is nice

@benwtrent benwtrent merged commit ab1de59 into apache:main Mar 19, 2025
7 checks passed
@benwtrent benwtrent deleted the osq-bugfix branch March 19, 2025 20:03
benwtrent added a commit that referenced this pull request Mar 19, 2025
* Fixing quantization interval initialization for optimized sq

* adding changes along with original binary quantization change

* adjusting test
jpountz pushed a commit to jpountz/lucene that referenced this pull request Mar 24, 2025
…14374)

* Fixing quantization interval initialization for optimized sq

* adding changes along with original binary quantization change

* adjusting test
jpountz pushed a commit to jpountz/lucene that referenced this pull request Mar 24, 2025
…14374)

* Fixing quantization interval initialization for optimized sq

* adding changes along with original binary quantization change

* adjusting test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants