Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kube-prometheus-stack] add default values for sums of apiserver_request:availability30d #5415

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Mx7ca
Copy link

@Mx7ca Mx7ca commented Mar 13, 2025

What this PR does / why we need it

Sums can only be calculated if none of the summands are null (can be tested with Prometheus-Queries 1 + foobar vs 1 + foobar or vector(0)).
In the case of the Availability Panels in the Dashboard "Kubernetes / API server", the sums cannot be calculated as long as there are no Slow-Queries for the API-Server (i.e. le=~"5(\\.0)?").
Some Queries have already defaults set via or vector(0), but not all. That breaks some Panels as they cannot show any data.

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

Special notes for your reviewer

  • In the PR, I have limited myself on the Prometheus-Rules for apiserver_request:availability30d, because adding defaults for all Rules, Panels etc. would have become a rabbit hole that would blow up this PR enormously.

  • I also had to decide for the format, because both options were already in use:

    (
      sum(...)
      or
      vector(0)
    )
    

    vs.
    sum(... or vector(0))

    I chose the single line approach, because in my opinion the default does not need to be too visible.

  • for the defaults, I've used either 0 or 1, depending on the usecase.

Checklist

  • DCO signed
  • Chart Version bumped
  • Title of the PR starts with chart name (e.g. [prometheus-couchdb-exporter])

+
sum by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
sum by ({{ range $.Values.defaultRules.additionalAggregationLabels }}{{ . }},{{ end }}cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"} or vector(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check line 3 of this file.

Copy link
Author

@Mx7ca Mx7ca Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, thanks. So if I understand it correctly the changes need to be done in https://github.com/prometheus-operator/kube-prometheus/blob/baf3c7a71ec9f889644231f677f8708791d38293/manifests/kubernetesControlPlane-prometheusRule.yaml#L802-L875 and then I need to sync it with the kube-prometheus-stack hacks.

Update: Or rather I need to update https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/rules/kube_apiserver-availability.libsonnet#L57-L141 which I then need to sync into the above mentioned file and then it can be synced here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. that how it works. Its a bit around the edge 2 times.

@Mx7ca
Copy link
Author

Mx7ca commented Mar 31, 2025

status update: The changes have been applied to the upstream repo in kubernetes-monitoring/kubernetes-mixin#1038
But unfortunately, in the intermediate repo, the sync currently fails, see https://github.com/prometheus-operator/kube-prometheus/actions/runs/14166522364/job/39680719802#step:5:210
So there's nothing I could sync into this PR yet.

@jkroepke
Copy link
Member

kubernetes-monitoring/kubernetes-mixin#888 It seems like this upstream change break something in kube-prometheus. I guess it needs to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants