Skip to content

add PQ training benchmark #482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 27, 2025
Merged

Conversation

sam-herman
Copy link
Collaborator

@sam-herman sam-herman commented Jun 25, 2025

Overview

In order to gauge for variance in PQ performance over various architectures we need to have a baseline measurement within jVector that captures the relative cost of training.
We need to have the number of vectors as an adjustable parameter which is harder to do on a static dataset without creating skewness in data distribution.

Changes

  • Added a PQ training benchmark for random vectors with adjustable vector count
  • Added PQ build scorer for index construction test to gauge the index construction performance with exact vs PQ build scorer

Testing

PQ training benchmark:

Benchmark                                                                 (M)  (originalDimension)  (vectorCount)  Mode  Cnt     Score     Error  Units
PQTrainingWithRandomVectorsBenchmark.productQuantizationComputeBenchmark   16                  768         100000  avgt    5  2566.605 ± 150.461  ms/op
PQTrainingWithRandomVectorsBenchmark.productQuantizationComputeBenchmark   32                  768         100000  avgt    5  2964.321 ± 124.850  ms/op
PQTrainingWithRandomVectorsBenchmark.productQuantizationComputeBenchmark   64                  768         100000  avgt    5  4028.526 ± 101.389  ms/op

PQ vs FP distance benchmark:

Benchmark                                           (M)  (dimension)  (queryCount)  (vectorCount)  Mode  Cnt       Score       Error  Units
PQDistanceCalculationBenchmark.distanceCalculation    0          768           100          10000  avgt    5  123151.191 ± 42382.755  us/op
PQDistanceCalculationBenchmark.distanceCalculation   16          768           100          10000  avgt    5   11592.736 ±   454.418  us/op
PQDistanceCalculationBenchmark.distanceCalculation   64          768           100          10000  avgt    5   39660.170 ±   315.126  us/op
PQDistanceCalculationBenchmark.distanceCalculation  192          768           100          10000  avgt    5  127094.323 ±  2948.728  us/op

PQ vs FP Index Construction:

Benchmark                                                    (buildScoreProviderType)  (numBaseVectors)  (originalDimension)  Mode  Cnt      Score      Error  Units
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                     Exact            100000                  768  avgt    5  19976.081 ±  804.598  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                     Exact            100000                 1536  avgt    5  37689.636 ± 1295.435  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                        PQ            100000                  768  avgt    5  25676.173 ± 2423.729  ms/op
IndexConstructionWithRandomSetBenchmark.buildIndexBenchmark                        PQ            100000                 1536  avgt    5  39913.628 ± 4659.553  ms/op

Signed-off-by: Samuel Herman <[email protected]>
Signed-off-by: Samuel Herman <[email protected]>
@tlwillke tlwillke self-requested a review June 27, 2025 03:04
Copy link
Collaborator

@marianotepper marianotepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Samuel Herman <[email protected]>
@sam-herman sam-herman merged commit 41435ea into datastax:main Jun 27, 2025
7 of 8 checks passed
@sam-herman sam-herman deleted the pq-training-benchmark branch June 27, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants