Skip to content

Memory regression in opentelemetry prometheus exporter v0.57.0 version with go 1.24 #6788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ns-jvillarfernandez opened this issue May 16, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ns-jvillarfernandez
Copy link

ns-jvillarfernandez commented May 16, 2025

Description

We've detected a memory leak / regression in one of our containers and the culprit seems to be go.opentelemetry.io/otel/exporters/prometheus v0.57.0 , or a combo change of go.opentelemetry.io/otel/exporters/prometheus v0.57.0 and go version upgrade from 1.23 to 1.24.

  1. Our containers started to OOM after several hours running
  2. we verified that the traffic pattern hadn't change
  3. attaching to a running pod and using go-tool we saw that this was tied to go.opentelemetry.io/otel/exporters/prometheus v0.57.0, precisely math/rand.newSource. Please, see the commands and screenshot below
$ kubectl port-forward pod/XXXXXXX 8086:8085
Forwarding from 127.0.0.1:8086 -> 8085
Forwarding from [::1]:8086 -> 8085
Handling connection for 8086

$ go tool pprof http://localhost:8086/debug/pprof/heap
Fetching profile over HTTP from http://localhost:8086/debug/pprof/heap
Saved profile in /Users/XXXXXXX/pprof/pprof.forward.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz
File: XXXXXXX
Build ID: 944b0f39f5443eb2ef822291ecd1bb226a3c768b
Type: inuse_space
Time: 2025-05-16 10:49:15 CEST
Entering interactive mode (type "help" for commands, "o" for options)

(pprof) top
Showing nodes accounting for 286.11MB, 80.53% of 355.27MB total
Dropped 159 nodes (cum <= 1.78MB)
Showing top 10 nodes out of 151
      flat  flat%   sum%        cum   cum%
  123.63MB 34.80% 34.80%   123.63MB 34.80%  math/rand.newSource (inline)
   61.10MB 17.20% 52.00%    61.10MB 17.20%  go.opentelemetry.io/otel/sdk/metric/exemplar.newStorage (inline)
   31.04MB  8.74% 60.73%    31.04MB  8.74%  go.opentelemetry.io/otel/sdk/metric/internal/aggregate.reset[go.shape.struct { FilteredAttributes []go.opentelemetry.io/otel/attribute.KeyValue; Time time.Time; Value go.shape.int64; SpanID []uint8 "json:\",omitempty\""; TraceID []uint8 "json:\",omitempty\"" }]
   25.04MB  7.05% 67.78%    25.04MB  7.05%  go.opentelemetry.io/otel/sdk/metric/internal/aggregate.reset[go.shape.struct { FilteredAttributes []go.opentelemetry.io/otel/attribute.KeyValue; Time time.Time; Value go.shape.float64; SpanID []uint8 "json:\",omitempty\""; TraceID []uint8 "json:\",omitempty\"" }]
   14.56MB  4.10% 71.88%    14.56MB  4.10%  bufio.NewWriterSize
    7.03MB  1.98% 73.86%     7.03MB  1.98%  bufio.NewReaderSize
       7MB  1.97% 75.83%    17.50MB  4.93%  go.opentelemetry.io/otel/exporters/prometheus.addExemplars[go.shape.int64]
    6.19MB  1.74% 77.57%    12.71MB  3.58%  io.copyBuffer
    5.52MB  1.55% 79.12%     5.52MB  1.55%  bytes.growSlice
       5MB  1.41% 80.53%        5MB  1.41%  go.opentelemetry.io/otel/attribute.computeDistinctFixed

Using go tool web

Image

Environment

  • OS: Linux
  • Architecture: x86_64
  • Go Version: 1.254
  • opentelemetry-go version: v0.57.0

Steps To Reproduce

  1. Use go 1.24 and go.opentelemetry.io/otel/exporters/prometheus v0.57.0
  2. Leave the container running with traffic for several hours and scrapping metrics
  3. Memory pattern shows a clear memory increase over time and it get to the limit and gets OOMkilled

Image

Expected behavior

No memory leak

@ns-jvillarfernandez ns-jvillarfernandez added the bug Something isn't working label May 16, 2025
@ns-jvillarfernandez
Copy link
Author

Could be related with #6732

@dmathieu
Copy link
Member

6732 hasn't been released yet, it's not included in 0.57.0.

@ns-obaro
Copy link

6732 hasn't been released yet, it's not included in 0.57.0.

When is the new release schedule? thanks!

@dmathieu
Copy link
Member

See #6793

@pree-dew
Copy link
Contributor

@dmathieu Can I pick this task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants