Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube_pod_status_reason is 0 for all reasons #2612

Open
dshackith opened this issue Feb 18, 2025 · 11 comments · May be fixed by #2644
Open

kube_pod_status_reason is 0 for all reasons #2612

dshackith opened this issue Feb 18, 2025 · 11 comments · May be fixed by #2644
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@dshackith
Copy link

dshackith commented Feb 18, 2025

What happened:
The metric kube_pod_status_reason shows 0 for all reasons, even when reasons should have value of 1.

What you expected to happen:
We use Karpenter in our clusters, and expect to be able to see when pods have a change in status based on actions Karpenter takes. In particular, we expect to see Evicted, NodeLost, and Shutdown reasons to show a value of 1 in clusters where consolidation is happening all the time (consolidateAfter value is 5m0s). We can see in our Karpenter metrics that at any given time, some pod is being moved, and should show up with a kube_pod_status_reason of Evicted with a value of 1.

How to reproduce it (as minimally and precisely as possible):
This prometheus query: sum(kube_pod_status_reason) by (reason) shows 0 for every reason, and when charted, those value remain the same over any time interval.

Anything else we need to know?:
The kube_pod_status_phase does not give use the information we need (specific reasons for status), and no other metric claims to provide this.

Environment:
Running KSM v2.13 managed via Helm chart
EKS v1.32.2
Karpenter v1.2.0

@dshackith dshackith added the kind/bug Categorizes issue or PR as related to a bug. label Feb 18, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 18, 2025
@dshackith
Copy link
Author

See also these issues where it was raised, but not resolved:
#2116
#1843

@konstantindobroliubov
Copy link

Makes sense to mention the version of kube-state-metrics that you used.
I face the same with one of the recent versions. Upgrading to the most fresh to be 100% sure.

@dshackith
Copy link
Author

Makes sense to mention the version of kube-state-metrics that you used.

Running KSM v2.13 managed via Helm chart

@konstantindobroliubov
Copy link

konstantindobroliubov commented Feb 19, 2025

Running KSM v2.13 managed via Helm chart

Sorry, I'm blind. Didn't correlate the KSM followed by EKS with "kube-state-metrics".
Tried with 2.14. The same result.
Manually evicted a few Pods by draining the Node where they were placed. There's an Event about eviction. Metric kube_pod_status_reason{} always returns 0 for all Pods.

@mrueg
Copy link
Member

mrueg commented Feb 19, 2025

If it's 0 for all, then https://github.com/kubernetes/kube-state-metrics/blob/main/internal/store/pod.go#L1547 the comparison here might not be correct.

@dshackith
Copy link
Author

dshackith commented Feb 19, 2025

kubectl get pods  -o json | jq -r '.items[] | select(.status.conditions[]?.type == "DisruptionTarget") | "\(.metadata.name)\t\(.status.conditions[] | select(.type == "DisruptionTarget") | .type)\t\(.status.conditions[] | select(.type == "DisruptionTarget") | .reason)\t\(.status.conditions[] | select(.type == "DisruptionTarget") | .message)"'

art-aa-service-6fd747848f-4vczd	DisruptionTarget	EvictionByEvictionAPI	Eviction API: evicting

In the spec for the pod I don't see something like pod.status.terminated.reason or pod.status.reason. I do see an array of items in pod.status.conditions which includes a .type, .reason, and .message, and I do see pod.status.containerStatuses[].state.terminated.reason.

@richabanker
Copy link
Contributor

/triage accepted
/assign @mrueg

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 20, 2025
@konstantindobroliubov
Copy link

It's more than a month since it was accepted for the triage. Any updates on this?

@mrueg
Copy link
Member

mrueg commented Mar 31, 2025

I've pretty much described where setting it to 0 is coming from, feel free to take a look into this and come up with a solution: #2612 (comment)

/help

@k8s-ci-robot
Copy link
Contributor

@mrueg:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

I've pretty much described what's needed to change here, feel free to take a look into this and come up with a solution: #2612 (comment)

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 31, 2025
@carlosmorenokm1
Copy link

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants