Skip to content

[Connector/Servicegraph] (err-mimir-sample-duplicate-timestamp) #39460

Closed
@hrmstockdale

Description

@hrmstockdale

Component(s)

connector/servicegraph

What happened?

Description

Context - I have an opentelemetry pipeline comprised of multiple collector services e.g loadbalancer, sampler and exporter collector services. These services have been configured to produce internal traces and export these traces, detailing & timing suboperations to a dedicated internal collector. The internal collector has receives these traces and exports them to a tempo. I have recently also added two connectors, the servicegraph and spanmetrics connectors respectively. Since adding the servicegraph connector and deriving metrics from the traces received, I am experiencing errors (err-mimir-sample-duplicate-timestamp) when exporting to mimir.

Checking my mimir pod logs I can see lines (debug exporter configured) such as

 ts=2025-04-16T14:23:39.769551145Z caller=push.go:221 level=error user=anonymous msg="detected an error while ingesting Prometheus remote-write request (the request may have been partially ingested)" httpCode=400 err="send data to ingesters: failed pushing to ingester mimir ││ -ingester-zone-c-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2025-04-16T14:23:39.011Z and is from  │
│ series traces_service_graph_request_total{client=\"loadbalancer\", failed=\"false\", server=\"exporter\"} (sampled 1/10)" insight=true

and in the internal collector exporting to mimir I see lines such as:

│ 2025-04-16T14:31:09.001Z    info    Metrics    {"otelcol.component.id": "debug", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 3, "data points": 12}                                                                       ││ 2025-04-16T14:31:09.486Z    error    internal/queue_sender.go:46    Exporting failed. Dropping data.    {"otelcol.component.id": "prometheusremotewrite", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "error": "Permanent error: Permanent error: Permanen │
│ t error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): send data to ingesters: failed pushing to ingester mimir-ingester-zone-b-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has ││  already been ingested (err-mimir-sample-duplicate-timestamp)", "dropped_items": 12}

I notice other issues have been raised such as #34169 reporting similar behaviour in relation to the servicegraph connector.

Steps to Reproduce

My WIP is closed source. Ill endeavor to add and reference something to help folks setup and recreate in due course

Expected Result

no errors

Actual Result

many 400s exporting to mimir: err-mimir-sample-duplicate-timestamp

Collector version

0.121.0

Environment information

Environment

OS: (e.g., linux (kubernetes pods - using the contrib helm chart))

OpenTelemetry Collector configuration

config:
  receivers:
    zipkin: null
    jaeger: null
    otlp:
      protocols:
        http: null
        grpc:
          endpoint: 0.0.0.0:4317

  connectors:
    spanmetrics:
      namespace: traces.spanmetrics # adheres to https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/#metrics
      histogram:
        explicit:
      dimensions:
        - name: http.method
        - name: http.target
        - name: http.status_code
      exclude_dimensions: ['status.code']
      exemplars:
        enabled: true
      resource_metrics_key_attributes:
        - service.name

    servicegraph:
      latency_histogram_buckets:
      - 100ms
      - 250ms
      - 500ms
      - 1s
      - 5s
      - 10s
      dimensions:
        - http.method
        - http.target
      virtual_node_extra_label: true
      store:
        max_items: 10
        ttl: 2s

  processors:
    batch:
      send_batch_max_size: 50
      send_batch_size: 10
      timeout: 2s
    tail_sampling:
      policies:
        - name: include-traces-generated-by-external-telemetry-logs-traces
          type: string_attribute
          string_attribute:
            key: "data_type"
            values: ["logs", "traces"]
        - name: exclude-traces-generated-by-internal-scraped-prometheus-metrics
          type: and
          and:
            and_sub_policy:
              - name: data-type-metrics
                type: string_attribute
                string_attribute:
                  key: "data_type"
                  values: ["metrics"]
              - name: items-failed-greater-than-zero
                type: numeric_attribute
                numeric_attribute:
                  key: "items.failed"
                  min_value: 1
                  max_value: 2147483647 # (max int32)

  exporters:
    debug:
      verbosity: basic
    prometheusremotewrite:
      endpoint: ${MIMIR_PROM_ENDPOINT}
      tls:
        insecure: true
    otlp/tempo:
      endpoint: ${TEMPO_OTLP_ENDPOINT}
      tls:
        insecure: true

  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [tail_sampling, batch]
        exporters: [otlp/tempo, servicegraph, spanmetrics] # tempo (with servicegraph and spanmetrics)
      metrics/spanmetrics:
        receivers: [spanmetrics]
        processors: [batch]
        exporters: [debug, prometheusremotewrite]
      metrics/servicegraph:
        receivers: [servicegraph]
        processors: [batch]
        exporters: [debug, prometheusremotewrite]

Log output

│ 2025-04-16T14:31:09.001Z    info    Metrics    {"otelcol.component.id": "debug", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 3, "data points": 12}                                                                       ││ 2025-04-16T14:31:09.486Z    error    internal/queue_sender.go:46    Exporting failed. Dropping data.    {"otelcol.component.id": "prometheusremotewrite", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "error": "Permanent error: Permanent error: Permanen │
│ t error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): send data to ingesters: failed pushing to ingester mimir-ingester-zone-b-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has ││  already been ingested (err-mimir-sample-duplicate-timestamp)", "dropped_items": 12}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    StalebugSomething isn't workingneeds triageNew item requiring triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions