Instrumentation scope attributes cause errors in Prometheus exporter #12939
Labels
area:service
bug
Something isn't working
collector-telemetry
healthchecker and other telemetry collection issues
Describe the bug
If the Collector emits two internal metric streams which differ only by their instrumentation scope attributes, the Prometheus exporter from the Go SDK —which is the default way of exposing the Collector's internal metrics— stops working.
Steps to reproduce
Run the core Collector distribution with any version from 0.123.0 to 0.125.0, with the following config file:
Additionally, enable the
--feature-gates +telemetry.newPipelineTelemetry
command-line option.Then, access
http://localhost:8888/metrics
in a browser. You should see an error similar to the following:Disabling the feature gate resolves the error, and shows the usual listing of internal Collector metrics.
Explanation
The
batch
processor generates a metric namedotelcol_processor_batch_metadata_cardinality
, with aprocessor
metric attribute containing the component ID. This component ID is the same for the two instances ofbatch
in the above config, which normally causes the metric points generated by both component instances to be aggregated into a single metric stream.The
telemetry.newPipelineTelemetry
feature gate injects instrumentation scope attributes in internal metrics based on which component instance emitted the metric. Because the metric points generated by the twobatch
instances now have differing identifying properties, they are no longer aggregated, and create two different metrics streams. This would normally be a good thing, providing more precise information about the behavior of the two pipelines.The Prometheus exporter converts OpenTelemetry metric streams into Prometheus time series, then exposes them through a Prometheus server, exposed on port 8888 by default. However, it currently does not support instrumentation scope attributes, and ignores them during the conversion. This leads to the two metric streams being converted to two time series with identical labels, which causes the Prometheus server to error out.
Relevant issues / PRs
In the Collector:
telemetry.newPipelineTelemetry
feature gate and the code for injecting component-identifying instrumentation scope attributes.In the OpenTelemetry specification:
otel_scope_info
metric.In the Go SDK:
The text was updated successfully, but these errors were encountered: