Description
Describe the bug
If the Collector emits two internal metric streams which differ only by their instrumentation scope attributes, the Prometheus exporter from the Go SDK —which is the default way of exposing the Collector's internal metrics— stops working.
Steps to reproduce
Run the core Collector distribution with any version from 0.123.0 to 0.125.0, with the following config file:
receivers:
nop:
processors:
batch:
exporters:
nop:
service:
pipelines:
logs:
receivers: [nop]
processors: [batch]
exporters: [nop]
traces:
receivers: [nop]
processors: [batch]
exporters: [nop]
Additionally, enable the --feature-gates +telemetry.newPipelineTelemetry
command-line option.
Then, access http://localhost:8888/metrics
in a browser. You should see an error similar to the following:
An error has occurred while serving metrics:
collected metric "otelcol_processor_batch_metadata_cardinality" { label:{name:"processor" value:"batch"} [...] gauge:{value:1}} was collected before with the same name and label values
Disabling the feature gate resolves the error, and shows the usual listing of internal Collector metrics.
Explanation
The batch
processor generates a metric named otelcol_processor_batch_metadata_cardinality
, with a processor
metric attribute containing the component ID. This component ID is the same for the two instances of batch
in the above config, which normally causes the metric points generated by both component instances to be aggregated into a single metric stream.
The telemetry.newPipelineTelemetry
feature gate injects instrumentation scope attributes in internal metrics based on which component instance emitted the metric. Because the metric points generated by the two batch
instances now have differing identifying properties, they are no longer aggregated, and create two different metrics streams. This would normally be a good thing, providing more precise information about the behavior of the two pipelines.
The Prometheus exporter converts OpenTelemetry metric streams into Prometheus time series, then exposes them through a Prometheus server, exposed on port 8888 by default. However, it currently does not support instrumentation scope attributes, and ignores them during the conversion. This leads to the two metric streams being converted to two time series with identical labels, which causes the Prometheus server to error out.
Relevant issues / PRs
In the Collector:
- PR Inject component-identifying scope attributes #12617 introduced the
telemetry.newPipelineTelemetry
feature gate and the code for injecting component-identifying instrumentation scope attributes. - Issue Missing attributes in internal Collector logs #12870 was caused by the default "off" state of the gate not injecting any component attributes, even in internal logs, where said attributes had been present for a few versions as regular log attributes.
- PR Permanently enable 'telemetry.newPipelineTelemetry' feature gate #12856 tried to solve this issue by stabilizing the feature gate, turning on the attribute injection unconditionally.
- PR [chore] Revert dc8e2dd #12917 suggested reverting the previous PR after noticing this issue with the Prometheus exporter.
- PR Put component-identifying scope attributes for internal metrics back behind feature gate #12933 instead set the feature gate back to Alpha (off by default), but restricted the gate to only toggle attribute injection for metrics. This means the exporter works in the default configuration, but may still show this error when the gate is explicitly enabled.
In the OpenTelemetry specification:
- The Prometheus compatibility specification currently requires OpenTelemetry to Prometheus converters to add the instrumentation scope name and version, but not the attributes, as labels on metric points. Attributes are instead exposed as a separate
otel_scope_info
metric. - spec#4223 (open) proposes to update the Prometheus compatibility specification to add instrumentation scope attributes as labels on metric points instead, to make sure they are treated as identifying properties like in OpenTelemetry, to avoid the aliasing issue at play here.
In the Go SDK:
- Currently, scope attributes are not exposed in any way.
- Issue go#5846 (open) tracks the implementation of support for instrumentation scope attributes.
- PR go#5947 (draft, unmerged) was a PoC for the previous issue, which adds instrumentation scope attributes as labels on the metric points directly, as suggested in spec#4223 above. Based on this PR, it seems that metric streams differing only by their scope schema would encounter a similar issue.
Metadata
Metadata
Assignees
Type
Projects
Status