Skip to content

Memory Limit Utilization metrics do not use the Working Set #40444

Open
@DevOpsFu

Description

@DevOpsFu

Component(s)

receiver/kubeletstats

What happened?

Description

The pod.memory.limit.utilization metric looks like it is calculating the percentage of the limit used based on the memory.usage metric. The result of this is that the resulting %age utilization is different from that seen in tools such as k9s which use the working_set as a basis for the calculation.

Reading some sources, the working_set is considered to be a better metric to base this calculation (and therefore alerts) on. For example:

https://last9.io/blog/pod-memory-usage/

Working Set Memory: The subset of memory that can't be reclaimed without application impact – the most important metric for pod health

https://www.redhat.com/en/blog/using-oc-adm-top-to-monitor-memory-usage

In Kubernetes documentation, Measuring resource usage - Memory, the working set is the amount of memory in use that cannot be freed under memory pressure.

In other words, working set is the appropriate metric for monitoring OOM limitations if you set up a resources.limits.memory limitation in pods.

Steps to Reproduce

N/A

Expected Result

N/A

Actual Result

Collector version

v0.109.0

Environment information

Deployment on Azure AKS

OpenTelemetry Collector configuration

Log output

Additional context

I am curious to know whether the utilization calculation was done this way for a good reason, or whether it should be changed to use the working_set as a basis - or whether we could have a separate metric that gives the limit utilization based on the working set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdiscussion neededCommunity discussion needednever staleIssues marked with this label will be never staled and automatically removedreceiver/kubeletstats

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions