Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrument the Ruler Notifier #2786

Merged
merged 5 commits into from
Jun 26, 2020

Conversation

gotjosh
Copy link
Contributor

@gotjosh gotjosh commented Jun 24, 2020

What this PR does:

The ruler notifier is not instrumented at the moment. This commit wraps
around the Ruler registry with the userID as a label and cortex_ as a
prefix and passes it to the notifier options.

The result? We have metrics.

Which issue(s) this PR fixes:
Fixes #2193

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern here is that we are adding a histogram metric with a user label. We already are reporting a lot of metrics for each user in the ruler and we should attempt to be a bit more judicious about what metrics we expose. When we encountered this situation previously we solved it by creating a shim metrics reporter that registers the registry we pass to the vendored Prometheus code. We can likely use the same pattern here.

Look at this file for reference as to what I'm referring to:
https://github.com/cortexproject/cortex/blob/master/pkg/alertmanager/alertmanager_metrics.go

@gouthamve
Copy link
Contributor

LGTM pending rebase. Slightly concerning but should be okay. prometheus_notifications_latency_seconds is a summary adding 5 metrics per AM. And given people typically run 3 AMs, it will be an additional 15 metrics per user. I think it should be fine. If its not, we can revert later.

gotjosh added 3 commits June 25, 2020 10:45
The ruler notifier is not instrumented at the moment. This commit wraps
around the ruler registry with the userID as a label and `cortex_` as a
prefix and passes it to the notifier options.

Signed-off-by: gotjosh <[email protected]>
Signed-off-by: gotjosh <[email protected]>
Signed-off-by: gotjosh <[email protected]>
@gotjosh gotjosh force-pushed the instrument-notifer branch from 577d8cb to 04147c5 Compare June 25, 2020 09:47
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (modulo a couple of nits)

Signed-off-by: gotjosh <[email protected]>
@gotjosh gotjosh force-pushed the instrument-notifer branch from f3b805f to 0e6b71c Compare June 25, 2020 09:59
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are just reexporting metrics from Prometheus without any other processing or aggregation, I don't think we should prefix this. That said, we already have other exported Prometheus metrics with prefix, so approving in the current state.

@gotjosh
Copy link
Contributor Author

gotjosh commented Jun 26, 2020

I can see that the changelog is having conflicts again, just let me know if we're ready to merge and I'll fix it 😄

@pracucci
Copy link
Contributor

@gotjosh I took care of CHANGELOG.

@pracucci pracucci merged commit 51f4a35 into cortexproject:master Jun 26, 2020
@gotjosh
Copy link
Contributor Author

gotjosh commented Jun 26, 2020

Thanks @pracucci !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instrument Ruler Notifier
6 participants