Skip to content

Instrumentation scope attributes cause errors in Prometheus exporter #12939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jade-guiton-dd opened this issue Apr 29, 2025 · 0 comments
Open
Labels
area:service bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues

Comments

@jade-guiton-dd
Copy link
Contributor

Describe the bug

If the Collector emits two internal metric streams which differ only by their instrumentation scope attributes, the Prometheus exporter from the Go SDK —which is the default way of exposing the Collector's internal metrics— stops working.

Steps to reproduce

Run the core Collector distribution with any version from 0.123.0 to 0.125.0, with the following config file:

receivers:
  nop:
processors:
  batch:
exporters:
  nop:
service:
  pipelines:
    logs:
      receivers: [nop]
      processors: [batch]
      exporters: [nop]
    traces:
      receivers: [nop]
      processors: [batch]
      exporters: [nop]

Additionally, enable the --feature-gates +telemetry.newPipelineTelemetry command-line option.

Then, access http://localhost:8888/metrics in a browser. You should see an error similar to the following:

An error has occurred while serving metrics:

collected metric "otelcol_processor_batch_metadata_cardinality" { label:{name:"processor"  value:"batch"}  [...]  gauge:{value:1}} was collected before with the same name and label values

Disabling the feature gate resolves the error, and shows the usual listing of internal Collector metrics.

Explanation

The batch processor generates a metric named otelcol_processor_batch_metadata_cardinality, with a processor metric attribute containing the component ID. This component ID is the same for the two instances of batch in the above config, which normally causes the metric points generated by both component instances to be aggregated into a single metric stream.

The telemetry.newPipelineTelemetry feature gate injects instrumentation scope attributes in internal metrics based on which component instance emitted the metric. Because the metric points generated by the two batch instances now have differing identifying properties, they are no longer aggregated, and create two different metrics streams. This would normally be a good thing, providing more precise information about the behavior of the two pipelines.

The Prometheus exporter converts OpenTelemetry metric streams into Prometheus time series, then exposes them through a Prometheus server, exposed on port 8888 by default. However, it currently does not support instrumentation scope attributes, and ignores them during the conversion. This leads to the two metric streams being converted to two time series with identical labels, which causes the Prometheus server to error out.

Relevant issues / PRs

In the Collector:

In the OpenTelemetry specification:

  • The Prometheus compatibility specification currently requires OpenTelemetry to Prometheus converters to add the instrumentation scope name and version, but not the attributes, as labels on metric points. Attributes are instead exposed as a separate otel_scope_info metric.
  • spec#4223 (open) proposes to update the Prometheus compatibility specification to add instrumentation scope attributes as labels on metric points instead, to make sure they are treated as identifying properties like in OpenTelemetry, to avoid the aliasing issue at play here.

In the Go SDK:

  • Currently, scope attributes are not exposed in any way.
  • Issue go#5846 (open) tracks the implementation of support for instrumentation scope attributes.
  • PR go#5947 (draft, unmerged) was a PoC for the previous issue, which adds instrumentation scope attributes as labels on the metric points directly, as suggested in spec#4223 above. Based on this PR, it seems that metric streams differing only by their scope schema would encounter a similar issue.
@jade-guiton-dd jade-guiton-dd added area:service bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues labels Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:service bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues
Projects
None yet
Development

No branches or pull requests

1 participant