Description
Component(s)
receiver/kubeletstats
What happened?
Description
The pod.memory.limit.utilization
metric looks like it is calculating the percentage of the limit used based on the memory.usage
metric. The result of this is that the resulting %age utilization is different from that seen in tools such as k9s
which use the working_set
as a basis for the calculation.
Reading some sources, the working_set
is considered to be a better metric to base this calculation (and therefore alerts) on. For example:
https://last9.io/blog/pod-memory-usage/
Working Set Memory: The subset of memory that can't be reclaimed without application impact – the most important metric for pod health
https://www.redhat.com/en/blog/using-oc-adm-top-to-monitor-memory-usage
In Kubernetes documentation, Measuring resource usage - Memory, the working set is the amount of memory in use that cannot be freed under memory pressure.
In other words, working set is the appropriate metric for monitoring OOM limitations if you set up a resources.limits.memory limitation in pods.
Steps to Reproduce
N/A
Expected Result
N/A
Actual Result
Collector version
v0.109.0
Environment information
Deployment on Azure AKS
OpenTelemetry Collector configuration
Log output
Additional context
I am curious to know whether the utilization calculation was done this way for a good reason, or whether it should be changed to use the working_set
as a basis - or whether we could have a separate metric that gives the limit utilization based on the working set.