|
| 1 | +# Kube-State-Metrics - Timeseries best practices |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +Author: Manuel Rüger ( <[email protected]>) |
| 6 | + |
| 7 | +Date: October 17th 2024 |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Introduction |
| 12 | + |
| 13 | +Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics. |
| 14 | +This document provides guidelines with the goal to create a good user experience when using these metrics. |
| 15 | + |
| 16 | +Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices. |
| 17 | +Feel encouraged to report these metrics and provide a pull request to improve them. |
| 18 | + |
| 19 | +## General best practices |
| 20 | + |
| 21 | +We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeling. |
| 22 | + |
| 23 | +## Best practices for kube-state-metrics |
| 24 | + |
| 25 | +### Avoid pre-computation |
| 26 | + |
| 27 | +kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects. |
| 28 | +We prefer not to add metrics that can be derived from existing raw metrics. For example, we would not want to expose a metric called `kube_pod_total` as it can be computed with `count(kube_pod_info)`. |
| 29 | +This way kube-state-metrics allows the user to have full control on how they want to use the metrics and gives them flexibility to do specific computation. |
| 30 | + |
| 31 | +### Static object properties |
| 32 | + |
| 33 | +An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes. |
| 34 | +This includes properties like name, namespace, uid etc. that have a 1:1 relationship with the object. |
| 35 | +It is a good practice to group those together into an `_info` metric. |
| 36 | +If there is a 1:n relationship (e.g. a list of ports), it should be in a separate metric to avoid generating too many metrics. |
| 37 | + |
| 38 | +### Dynamic object properties |
| 39 | + |
| 40 | +An object can also have a dynamic set of properties, which are usually part of the status field. |
| 41 | +These change during the lifecycle of the object. |
| 42 | +For example a pod can be in different states like "Pending", "Running" etc. |
| 43 | +These should be part of a "State Set" that includes labels that identify the object as well as the dynamic property. |
| 44 | + |
| 45 | +### Linked properties |
| 46 | + |
| 47 | +If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric. |
| 48 | + |
| 49 | +### Optional properties |
| 50 | + |
| 51 | +Some Kubernetes objects have optional fields. In case there is an optional value, the label should still be exposed, ideally as an empty string. |
| 52 | + |
| 53 | +### Timestamps |
| 54 | + |
| 55 | +Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`. The date value is represented in [UNIX epoch seconds](https://en.wikipedia.org/wiki/Unix_time). |
| 56 | + |
| 57 | +### Cardinality |
| 58 | + |
| 59 | +Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot. |
| 60 | +In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others. |
| 61 | +If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error. |
| 62 | +If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided. |
| 63 | + |
| 64 | +## Stability |
| 65 | + |
| 66 | +We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics. |
| 67 | +Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API. |
| 68 | +They can change anytime and should be used with caution. |
| 69 | +They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them. |
| 70 | + |
| 71 | +Stable metrics are considered frozen with the exception of new labels being added. |
| 72 | +A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2. |
0 commit comments