Skip to content

Clickhouse exporter dropping metrics under pressure #38739

Closed
@ksharinarayanan

Description

@ksharinarayanan

Component(s)

exporter/clickhouse

What happened?

Description

Hi,

We are currently using pushing only metrics to the clickhouse exporter which inserts to clickhouse cloud every 10 seconds or every 100k records. The exporter at source exports every 10 seconds.

This works fine until we generate 7K metric data points per second from the source, but once it reaches around 8.7K metric data points per second, almost 30-40% of the metrics are dropped. The source is managed AWS flink and unfortunately it doesn't give us logs related to opentelemetry SDK. So, I have completely no idea why they are getting dropped. The reason I'm concluding that the collector is dropping is because I expect 6 inserts to happen in a minute, but under this load on average an insert is skipped in a minute consistently which I am able to see through the clickhouse cloud query insights, attaching the image below, looking at the image, you can notice that at 12:23:51, an insert should have occured which is missing,

Image

I am just thinking that I might be doing something wrong with the collector config. I've tried a bunch of different configurations related to the sending_queue and batch processor but nothing seems to be working.

Collector version

v0.120.0

Environment information

Environment

OS: Amazon linux

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    send_batch_size: 80000
    timeout: 10s
    send_batch_max_size: 100000
  filter:
    error_mode: ignore
    metrics:
      datapoint:
        - 'attributes["http.request.method"] == "GET"'

exporters:
  clickhouse:
    # credentials 
    metrics_tables:
      sum: 
        name: "otel_metrics_sum_null"
    async_insert: true
    compress: zstd
    timeout: 60s
    database: otel
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval: 30s
      max_elapsed_time: 300s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 1000000

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [filter, batch]
      exporters: [clickhouse]

Log output

There were no logs when the metrics were getting dropped

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions