add PQ training benchmark #482

sam-herman · 2025-06-25T20:55:53Z

Overview

In order to gauge for variance in PQ performance over various architectures we need to have a baseline measurement within jVector that captures the relative cost of training.
We need to have the number of vectors as an adjustable parameter which is harder to do on a static dataset without creating skewness in data distribution.

Changes

Added a PQ training benchmark for random vectors with adjustable vector count
Added PQ build scorer for index construction test to gauge the index construction performance with exact vs PQ build scorer

Testing

PQ training benchmark:

Benchmark                                                                 (M)  (originalDimension)  (vectorCount)  Mode  Cnt     Score     Error  Units
PQTrainingWithRandomVectorsBenchmark.productQuantizationComputeBenchmark   16                  768         100000  avgt    5  2566.605 ± 150.461  ms/op
PQTrainingWithRandomVectorsBenchmark.productQuantizationComputeBenchmark   32                  768         100000  avgt    5  2964.321 ± 124.850  ms/op
PQTrainingWithRandomVectorsBenchmark.productQuantizationComputeBenchmark   64                  768         100000  avgt    5  4028.526 ± 101.389  ms/op

PQ vs FP distance benchmark:

Benchmark                                           (M)  (dimension)  (queryCount)  (vectorCount)  Mode  Cnt       Score       Error  Units
PQDistanceCalculationBenchmark.distanceCalculation    0          768           100          10000  avgt    5  123151.191 ± 42382.755  us/op
PQDistanceCalculationBenchmark.distanceCalculation   16          768           100          10000  avgt    5   11592.736 ±   454.418  us/op
PQDistanceCalculationBenchmark.distanceCalculation   64          768           100          10000  avgt    5   39660.170 ±   315.126  us/op
PQDistanceCalculationBenchmark.distanceCalculation  192          768           100          10000  avgt    5  127094.323 ±  2948.728  us/op

PQ vs FP Index Construction:

Benchmark                                                    (buildScoreProviderType)  (numBaseVectors)  (originalDimension)  Mode  Cnt      Score      Error  Units
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                     Exact            100000                  768  avgt    5  19976.081 ±  804.598  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                     Exact            100000                 1536  avgt    5  37689.636 ± 1295.435  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                        PQ            100000                  768  avgt    5  25676.173 ± 2423.729  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                        PQ            100000                 1536  avgt    5  39913.628 ± 4659.553  ms/op

Signed-off-by: Samuel Herman <[email protected]>

marianotepper

LGTM

Signed-off-by: Samuel Herman <[email protected]>

sam-herman added 4 commits June 25, 2025 13:52

add PQ training benchmark

4a8010a

Signed-off-by: Samuel Herman <[email protected]>

lock concurrency

1e1203f

Signed-off-by: Samuel Herman <[email protected]>

add PQ and without PQ construction

56863c5

Signed-off-by: Samuel Herman <[email protected]>

make explicit iteration level setup

6af53a6

Signed-off-by: Samuel Herman <[email protected]>

tlwillke self-requested a review June 27, 2025 03:04

marianotepper approved these changes Jun 27, 2025

View reviewed changes

add PQ distance benchmark

03e78f0

Signed-off-by: Samuel Herman <[email protected]>

sam-herman merged commit 41435ea into datastax:main Jun 27, 2025
7 of 8 checks passed

sam-herman deleted the pq-training-benchmark branch June 27, 2025 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add PQ training benchmark #482

add PQ training benchmark #482

Uh oh!

sam-herman commented Jun 25, 2025 •

edited

Loading

Uh oh!

marianotepper left a comment

Uh oh!

Uh oh!

Uh oh!

add PQ training benchmark #482

add PQ training benchmark #482

Uh oh!

Conversation

sam-herman commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Testing

Uh oh!

marianotepper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sam-herman commented Jun 25, 2025 •

edited

Loading