Skip to content

Summary sketch #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 25, 2025
Merged

Summary sketch #177

merged 21 commits into from
Jun 25, 2025

Conversation

NelsonVides
Copy link
Member

@NelsonVides NelsonVides commented Mar 24, 2025

This uses now ddskerl, an implementation of the DDSketch algorithm, instead of quantile_estimator, which is in turn an implementation of Cormode's biased algorithms.

The most important difference is that the biased algorithms aren't mathematically mergeable, hence keeping a per-scheduler summary would never work, and keeping a single global one would never scale (and not mentioning race conditions on inserts as the data-structure is not possible to treat atomically in an ets table). Instead, the DDSketch algorithm is fully mergeable, and the data structure implementation has an ETS backend fully tested for all sorts of race conditions.

As a possibility that is out of the scope of this PR, a custom exporter might provide the Sketch data structure to datadog, so that the quantile merges can be done on the server, and that way the metrics server can aggregate quantiles across many hosts in a meaningful way.

@NelsonVides NelsonVides self-assigned this Mar 24, 2025
Copy link

codecov bot commented Mar 24, 2025

Codecov Report

Attention: Patch coverage is 99.38650% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...eunit/metric/prometheus_quantile_summary_tests.erl 97.91% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/metrics/prometheus_quantile_summary.erl 100.00% <100.00%> (+7.63%) ⬆️
src/prometheus_buckets.erl 100.00% <100.00%> (ø)
src/prometheus_metric_spec.erl 100.00% <ø> (ø)
src/prometheus_sup.erl 77.41% <ø> (ø)
src/prometheus_time.erl 97.43% <100.00%> (ø)
.../eunit/format/prometheus_protobuf_format_tests.erl 100.00% <100.00%> (ø)
test/eunit/format/prometheus_text_format_tests.erl 100.00% <100.00%> (ø)
test/eunit/metric/prometheus_histogram_tests.erl 98.30% <100.00%> (+0.02%) ⬆️
test/eunit/prometheus_buckets_tests.erl 100.00% <100.00%> (ø)
test/eunit/prometheus_time_tests.erl 100.00% <100.00%> (ø)
... and 1 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lhoguin
Copy link

lhoguin commented Mar 24, 2025

What problem are you trying to solve in this PR?

@NelsonVides
Copy link
Member Author

What problem are you trying to solve in this PR?

Having quantile summaries entirely. The previous ones were known to be broken (see #170, #146, #159). My fix at #170 was just to stop it from breaking, but they were still not correct due to data-loss during race conditions, and surely not performant as they were doing lots of lookup-inserts of big data structures, when this PR does only update_counters.

@NelsonVides
Copy link
Member Author

I've pushed a couple of changes to the default quantiles, in order not create any backwards incompatible change. As such, previously created quantiles using targets and compres_limit keys would not fail, but simply such keys would be ignored.

@NelsonVides NelsonVides mentioned this pull request May 23, 2025
@NelsonVides
Copy link
Member Author

@paulo-ferraz-oliveira @lhoguin so, this has been in prod for a month already and it's working wonderfully 🔥

Maybe time to review and get it merged?

PS: for the record, the current history is a mess because as there might be applications depending on the branch, force-pushing might break builds when the hash disappeared, so I've had to keep the history untouched. When merging, I'll need to squash-merge, and then ensure this branch doesn't disappear for at least a few months :)

@mkuratczyk, when we merge this, we can release a new version together with all the latest versions of maps you've contributed recently too. v6 I suppose, as this is a breaking change? 🤔

@NelsonVides
Copy link
Member Author

Hello there, any available reviews here? 🙏🏽

@mkuratczyk
Copy link
Contributor

maps support was not breaking (I hope :) ). I can try to review this PR in a few hours

@NelsonVides
Copy link
Member Author

maps support was not breaking (I hope :) ). I can try to review this PR in a few hours

Well, probably. Going to review what we had so far and if I don't see anything breaking then will release 5.1.0 👌🏽

Thanks! :)

@NelsonVides NelsonVides changed the title [TESTING] Summary sketch Summary sketch Jun 24, 2025
@paulo-ferraz-oliveira
Copy link

I was keeping tabs on what was changing, and nothing surprised me, so I'm approving.

mkuratczyk
mkuratczyk previously approved these changes Jun 24, 2025
@NelsonVides
Copy link
Member Author

Thanks for the approvals! Merged the latest master and fixed a documentation typo, will bypass stale reviews and merge, hopefully nobody has a problem with that :)

@NelsonVides NelsonVides merged commit 3bcc996 into master Jun 25, 2025
7 checks passed
@NelsonVides NelsonVides deleted the summary_sketch branch June 25, 2025 06:45
@NelsonVides NelsonVides restored the summary_sketch branch June 25, 2025 06:45
@NelsonVides NelsonVides mentioned this pull request Jun 25, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants