Description
Component(s)
connector/servicegraph
What happened?
Description
Context - I have an opentelemetry pipeline comprised of multiple collector services e.g loadbalancer
, sampler
and exporter
collector services. These services have been configured to produce internal traces and export these traces, detailing & timing suboperations to a dedicated internal
collector. The internal
collector has receives these traces and exports them to a tempo. I have recently also added two connectors, the servicegraph
and spanmetrics
connectors respectively. Since adding the servicegraph
connector and deriving metrics from the traces received, I am experiencing errors (err-mimir-sample-duplicate-timestamp
) when exporting to mimir.
Checking my mimir pod logs I can see lines (debug exporter configured) such as
ts=2025-04-16T14:23:39.769551145Z caller=push.go:221 level=error user=anonymous msg="detected an error while ingesting Prometheus remote-write request (the request may have been partially ingested)" httpCode=400 err="send data to ingesters: failed pushing to ingester mimir ││ -ingester-zone-c-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2025-04-16T14:23:39.011Z and is from │
│ series traces_service_graph_request_total{client=\"loadbalancer\", failed=\"false\", server=\"exporter\"} (sampled 1/10)" insight=true
and in the internal
collector exporting to mimir I see lines such as:
│ 2025-04-16T14:31:09.001Z info Metrics {"otelcol.component.id": "debug", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 3, "data points": 12} ││ 2025-04-16T14:31:09.486Z error internal/queue_sender.go:46 Exporting failed. Dropping data. {"otelcol.component.id": "prometheusremotewrite", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "error": "Permanent error: Permanent error: Permanen │
│ t error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): send data to ingesters: failed pushing to ingester mimir-ingester-zone-b-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has ││ already been ingested (err-mimir-sample-duplicate-timestamp)", "dropped_items": 12}
I notice other issues have been raised such as #34169 reporting similar behaviour in relation to the servicegraph
connector.
Steps to Reproduce
My WIP is closed source. Ill endeavor to add and reference something to help folks setup and recreate in due course
Expected Result
no errors
Actual Result
many 400s exporting to mimir: err-mimir-sample-duplicate-timestamp
Collector version
0.121.0
Environment information
Environment
OS: (e.g., linux (kubernetes pods - using the contrib helm chart))
OpenTelemetry Collector configuration
config:
receivers:
zipkin: null
jaeger: null
otlp:
protocols:
http: null
grpc:
endpoint: 0.0.0.0:4317
connectors:
spanmetrics:
namespace: traces.spanmetrics # adheres to https://grafana.com/docs/tempo/latest/metrics-generator/span_metrics/#metrics
histogram:
explicit:
dimensions:
- name: http.method
- name: http.target
- name: http.status_code
exclude_dimensions: ['status.code']
exemplars:
enabled: true
resource_metrics_key_attributes:
- service.name
servicegraph:
latency_histogram_buckets:
- 100ms
- 250ms
- 500ms
- 1s
- 5s
- 10s
dimensions:
- http.method
- http.target
virtual_node_extra_label: true
store:
max_items: 10
ttl: 2s
processors:
batch:
send_batch_max_size: 50
send_batch_size: 10
timeout: 2s
tail_sampling:
policies:
- name: include-traces-generated-by-external-telemetry-logs-traces
type: string_attribute
string_attribute:
key: "data_type"
values: ["logs", "traces"]
- name: exclude-traces-generated-by-internal-scraped-prometheus-metrics
type: and
and:
and_sub_policy:
- name: data-type-metrics
type: string_attribute
string_attribute:
key: "data_type"
values: ["metrics"]
- name: items-failed-greater-than-zero
type: numeric_attribute
numeric_attribute:
key: "items.failed"
min_value: 1
max_value: 2147483647 # (max int32)
exporters:
debug:
verbosity: basic
prometheusremotewrite:
endpoint: ${MIMIR_PROM_ENDPOINT}
tls:
insecure: true
otlp/tempo:
endpoint: ${TEMPO_OTLP_ENDPOINT}
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling, batch]
exporters: [otlp/tempo, servicegraph, spanmetrics] # tempo (with servicegraph and spanmetrics)
metrics/spanmetrics:
receivers: [spanmetrics]
processors: [batch]
exporters: [debug, prometheusremotewrite]
metrics/servicegraph:
receivers: [servicegraph]
processors: [batch]
exporters: [debug, prometheusremotewrite]
Log output
│ 2025-04-16T14:31:09.001Z info Metrics {"otelcol.component.id": "debug", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 3, "data points": 12} ││ 2025-04-16T14:31:09.486Z error internal/queue_sender.go:46 Exporting failed. Dropping data. {"otelcol.component.id": "prometheusremotewrite", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "error": "Permanent error: Permanent error: Permanen │
│ t error: remote write returned HTTP status 400 Bad Request; err = %!w(<nil>): send data to ingesters: failed pushing to ingester mimir-ingester-zone-b-0: user=anonymous: the sample has been rejected because another sample with the same timestamp, but a different value, has ││ already been ingested (err-mimir-sample-duplicate-timestamp)", "dropped_items": 12}
Additional context
No response