Skip to content

Commit 32e7727

Browse files
authored
Merge pull request #2528 from mrueg/metrics-best-practices
docs: Add best practices for metrics
2 parents 9652811 + dcfaae9 commit 32e7727

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

docs/design/metrics-best-practices.md

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Kube-State-Metrics - Timeseries best practices
2+
3+
---
4+
5+
Author: Manuel Rüger (<[email protected]>)
6+
7+
Date: October 17th 2024
8+
9+
---
10+
11+
## Introduction
12+
13+
Kube-State-Metrics' goal is to provide insights into the state of Kubernetes objects by exposing them as metrics.
14+
This document provides guidelines with the goal to create a good user experience when using these metrics.
15+
16+
Please be aware that this document is introduced in a later stage of the project and there might be metrics that do not follow these best practices.
17+
Feel encouraged to report these metrics and provide a pull request to improve them.
18+
19+
## General best practices
20+
21+
We follow [Prometheus](https://prometheus.io/docs/practices/naming/) best practices in terms of naming and labeling.
22+
23+
## Best practices for kube-state-metrics
24+
25+
### Avoid pre-computation
26+
27+
kube-state-metrics should expose metrics on an individual object level and avoid any sort of pre-computation unless it is required due to for example high cardinality on objects.
28+
We prefer not to add metrics that can be derived from existing raw metrics. For example, we would not want to expose a metric called `kube_pod_total` as it can be computed with `count(kube_pod_info)`.
29+
This way kube-state-metrics allows the user to have full control on how they want to use the metrics and gives them flexibility to do specific computation.
30+
31+
### Static object properties
32+
33+
An object usually has a stable set of properties that do not change during its lifecycle in Kubernetes.
34+
This includes properties like name, namespace, uid etc. that have a 1:1 relationship with the object.
35+
It is a good practice to group those together into an `_info` metric.
36+
If there is a 1:n relationship (e.g. a list of ports), it should be in a separate metric to avoid generating too many metrics.
37+
38+
### Dynamic object properties
39+
40+
An object can also have a dynamic set of properties, which are usually part of the status field.
41+
These change during the lifecycle of the object.
42+
For example a pod can be in different states like "Pending", "Running" etc.
43+
These should be part of a "State Set" that includes labels that identify the object as well as the dynamic property.
44+
45+
### Linked properties
46+
47+
If an object contains a substructure that links multiple properties together (e.g. endpoint address and port), those should be reported in the same metric.
48+
49+
### Optional properties
50+
51+
Some Kubernetes objects have optional fields. In case there is an optional value, the label should still be exposed, ideally as an empty string.
52+
53+
### Timestamps
54+
55+
Timestamps like creation time or modification time should be exposed as a value. The metric should end with `_timestamp_seconds`. The date value is represented in [UNIX epoch seconds](https://en.wikipedia.org/wiki/Unix_time).
56+
57+
### Cardinality
58+
59+
Some object properties can cause cardinality issues if they can contain a lot of different values or are linked together with multiple properties that also can change a lot.
60+
In this case it is better to limit the number of values that can be exposed within kube-state-metrics by allowing only a few of them and have a default for others.
61+
If for example the Kubernetes object contains a status field that contains an error message that can change a lot, it would be better to have a boolean `error="true"` label in case there is an error.
62+
If there are some error messages that are worth exposing, those could be exposed and for any other message, a default value could be provided.
63+
64+
## Stability
65+
66+
We follow the stability framework derived from Kubernetes, in which we expose experimental and stable metrics.
67+
Experimental metrics are recently introduced or expose alpha/beta resources in the Kubernetes API.
68+
They can change anytime and should be used with caution.
69+
They can be promoted to a stable metric once the object stabilized in the Kubernetes API or they were part of two consecutive releases and haven't observed any changes in them.
70+
71+
Stable metrics are considered frozen with the exception of new labels being added.
72+
A stable metric or a label on a stable metric can be deprecated in release Major.Minor and the earliest point it will be removed is the release Major.Minor+2.

0 commit comments

Comments
 (0)