container usage does not decrease after buffer is emptied #7234

dpajin · 2020-03-26T10:38:52Z

Relevant telegraf.conf:

[agent]
  interval = "1m"
  round_interval = true
  flush_buffer_when_full = true
  metric_buffer_limit = 3000000
  metric_batch_size = 100000
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = true
  omit_hostname = true

  
# Output database for telegraf internal statistics
[[outputs.influxdb]]
  urls = ["http://localhost:8086"]
  database = 
  username = 
  password = 
  precision = "s"
  retention_policy = "default"
  timeout = "5s"
  skip_database_creation = true
  #metric_buffer_limit = 10000
  #metric_batch_size = 1000
  #flush_interval = "30s"
  namepass = ["internal_*"]

# Telegraf internal stats collection
[[inputs.internal]]
  ## If true, collect telegraf memory stats.
  interval = "1m"
  collect_memstats = false


# Output databases for each node
[[outputs.influxdb]]
  urls = ["http://db_1:8086"]
  database = 
  username = 
  password = 
  precision = "s"
  retention_policy = "default"
  timeout = "5s"
  skip_database_creation = true
  namedrop = ["internal_*"]

[[outputs.influxdb]]
  urls = ["http://db_2:8086"]
  database = 
  username = 
  password = 
  precision = "s"
  retention_policy = "default"
  timeout = "5s"
  skip_database_creation = true
  namedrop = ["internal_*"]

[[outputs.influxdb]]
  urls = ["http://db_3:8086"]
  database = 
  username = 
  password = 
  precision = "s"
  retention_policy = "default"
  timeout = "5s"
  skip_database_creation = true
  namedrop = ["internal_*"]

  
[[inputs.influxdb_listener]]
  ## Address and port to host HTTP listener on
  service_address = ":9096"

  ## maximum duration before timing out read of the request
  read_timeout = "5s"
  ## maximum duration before timing out write of the response
  write_timeout = "5s"

  ## Maximum allowed HTTP request body size in bytes.
  ## 0 means to use the default of 32MiB.
  max_body_size = 0

System info:

Telegraf version 1.13.4, running in the container, image from Docker Hub

Linux RMIMH03S 5.3.0-40-generic #32~18.04.1-Ubuntu SMP

Docker

Docker 19.03.7

Steps to reproduce:

I use Telegraf with influxdb_listener input plugin and output to write data into multiple InfluxDB databases. I use metric_buffer_limit with 3M metrics.

One database is down, it will store the metric in the buffer.
Buffer gets full.
Database is available and the metrics from the buffer are written to the database.
The memory used for buffering is not released after buffer is emptied

Expected behavior:

I would expect that the memory consumption of the Telegraf container is decreased after buffer is emptied.

Actual behavior:

Memory consumption stays high even when the buffer is emptied.

Additional info:

The image below shows the usage of the memory of the Telegraf docker container. Around 19:40 buffering has started. Around 23:20 the buffer was full. Around 00:00 the database was available and the metric were written to the database. At that time container memory usage slightly increase from 2.29 GB to 2.35 GB.

The memory usage stays still the same after 12 hours.
When the database is not available again, the new buffering does not increase memory further (at 40% buffer usage, the memory consumption did not increased)

Is this expected behavior?

The text was updated successfully, but these errors were encountered:

danielnelson · 2020-03-26T17:36:01Z

Definitely not expecting this behavior. We have had issues in the past where all metric references were not cleared and they couldn't be garbage collected.

There are a couple of follow up test variations that we should perform:

Same test as above but without Docker
Compare results when using a single output vs multiple outputs.

If by change you could run either of these tests it would be very helpful. Can you also share how you are calculating memory usage, any input plugins used and the underlying queries.

dpajin · 2020-03-27T11:21:58Z

Memory usage is collected by another instance of Telegraf running directly on host, using docker input plugin. Export is done again to InfluxDB and using the following query in Grafana to draw this graph:

SELECT mean("usage") FROM "docker_container_mem" WHERE "node" =~ /^$node$/ AND $timeFilter GROUP BY time($__interval), "container_name" fill(null)

Okay, I will try to make those test as suggested and I will come back with results.

ssoroka · 2021-03-08T18:59:28Z

@dpajin is this still an issue?

sjwang90 · 2021-06-28T21:13:17Z

@dpajin Closing this issue. Feel free to re-open if still persists.

danielnelson added bug unexpected problem or unintended behavior ready labels Mar 26, 2020

danielnelson added this to the planned milestone Mar 27, 2020

sjwang90 removed the ready label Jan 29, 2021

sjwang90 removed this from the Planned milestone Jan 29, 2021

helenosheaa added the area/influxdb label Jan 29, 2021

sjwang90 closed this as completed Jun 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

container usage does not decrease after buffer is emptied #7234

container usage does not decrease after buffer is emptied #7234

dpajin commented Mar 26, 2020

danielnelson commented Mar 26, 2020

Uh oh!

dpajin commented Mar 27, 2020

Uh oh!

ssoroka commented Mar 8, 2021

Uh oh!

sjwang90 commented Jun 28, 2021

Uh oh!

container usage does not decrease after buffer is emptied #7234

container usage does not decrease after buffer is emptied #7234

Comments

dpajin commented Mar 26, 2020

Relevant telegraf.conf:

System info:

Docker

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

danielnelson commented Mar 26, 2020

Uh oh!

dpajin commented Mar 27, 2020

Uh oh!

ssoroka commented Mar 8, 2021

Uh oh!

sjwang90 commented Jun 28, 2021

Uh oh!