Skip to content

Thanos Receive - High memory usage #2810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mxmorin opened this issue Jun 26, 2020 · 5 comments
Closed

Thanos Receive - High memory usage #2810

mxmorin opened this issue Jun 26, 2020 · 5 comments

Comments

@mxmorin
Copy link

mxmorin commented Jun 26, 2020

Thanos, Prometheus and Golang version used:
thanos, version 0.13.0 (branch: HEAD, revision: adf6fac)
build user: root@ee9c796b3048
build date: 20200622-09:49:32
go version: go1.14.2

Object Storage Provider:
S3

What happened:
We have 2 load balanced thanos receive instance with 16go of memory

Memory is continuously increasing and take 100% or RAM. Stopping receive frees the memory but problem occurs

image

We have recently updated 0.12.2 to 0.13.0.
We haven't this problem before but some metrics have been added

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components:
When memory is full, logs prints a lot of internal server error with no more explications

level=error ts=2020-06-26T07:01:54.991169587Z caller=handler.go:282 component=receive component=receive-handler err="context canceled" msg="internal server error"

Anything else we need to know:
thanos receive command line
/usr/bin/thanos receive --http-address 0.0.0.0:19904 --grpc-address 0.0.0.0:19903 --remote-write.address 0.0.0.0:19291 --label=thanos_replica="q1thanos01" --tsdb.path=/projet/data/thanos/receive --tsdb.retention=1d --objstore.config-file=/etc/thanos/objstore.yaml

objstore definition
type: S3
config:
  bucket: "thanos-qual"
  endpoint: "storagegrid.xxxxxx:8082"
  region: "yyyyy"
  access_key: "zzzz"
  secret_key: "******"
  insecure: false
  signature_version2: true
  encrypt_sse: false
  put_user_metadata: {}
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: true
  trace:
    enable: false
  part_size: 134217728
@brancz
Copy link
Member

brancz commented Jun 26, 2020

We have also seen some problems with blocks being cut, @krasi-georgiev is investigating this. For us restarting the process fixes the problem for now, but you may need to do this every couple of days until the problem is resolved.

@mxmorin
Copy link
Author

mxmorin commented Jun 26, 2020

Is it a 0.13.0 issue ?
Can i downgrade version ?

@diemus
Copy link

diemus commented Jul 10, 2020

have this problem too, @mxmorin did you downgrade to 0.12.2?

@mxmorin
Copy link
Author

mxmorin commented Jul 10, 2020

No. I stayed in 0.13.0 because of OOM when restarting with 0.12.2
I kill the process every day by crontab for workaround

@bwplotka
Copy link
Member

Hi, We fixed major leaks in storeAPI.Series that also impacted write and compact and resulted in high memory use. Fix: #2866

See also a related issue: #2823

We also improved our suite so it will be easier to catch those on our PRs 🤗

I would recommend giving try to master-2020-07-09-60ede4c1

We just roll this to production and we see that go routines leak are starting to look better so far:
image

I am marking this as done 🤗 If still not fixed on prod let's reopen. Thanks for reporting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants