Skip to content

Put component-identifying scope attributes for internal metrics back behind feature gate #12933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jade-guiton-dd
Copy link
Contributor

@jade-guiton-dd jade-guiton-dd commented Apr 28, 2025

Context

PR #12617 introduced logic to inject new instrumentation scope attributes in all internal telemetry to identify which Collector component it came from. These attributes had already been added to internal logs as regular log attributes, and this PR switched them to scope attributes for consistency. The new logic was placed behind an Alpha stage feature gate, telemetry.newPipelineTelemetry.

Unfortunately, the default "off" state of the feature gate disabled the injection of component-identifying attributes entirely, which was a regression since they had been present in internal logs in previous releases. See issue #12870 for an in-depth discussion of this issue.

To correct this, PR #12856 was filed, which stabilized the feature gate, making it on by default, with no way to disable it, and removed the logic that the feature gate used to toggle. This was thought to be the simplest way to mitigate the regression in the "off" state, since we planned to stabilize the feature eventually anyways.

Unfortunately, it was found that the "on" state of the feature gate causes a different issue: the Prometheus exporter is the default way of exporting the Collector's internal metrics, accessible at collector:8888/metrics. This exporter does not currently have any support for instrumentation scope attributes, meaning that metric streams differentiated by said attributes but not by any other identifying property will appear as aliases to Prometheus, which causes an error. This completely breaks the export of Collector metrics through Prometheus under some simple configurations, which is a release blocker.

Description

To fix this issue, this PR sets the telemetry.newPipelineTelemetry feature gate back to "Alpha" (off by default), and reintroduces logic to disable the injection of the new instrumentation scope attributes when the gate is off, but only in internal metrics. Note that the new logic is still used unconditionally for logs and traces, to avoid reintroducing the logs issue (#12870).

This should avoid breaking the Collector in its default configuration while we try to get a fix in the Prometheus exporter.

Link to tracking issue

No tracking issue currently, will probably file one later.

Testing

I performed some simple manual testing with a config file like the following:

receivers:
  otlp: [...]
processors:
  batch:
exporters:
  debug: [...]
service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
  telemetry:
    metrics:
      level: detailed    
    traces: [...]
    logs: [...]

The two batch processors create aliased metric streams, which are only differentiated by the new component attributes. I checked that:

  1. this config causes an error in the Prometheus exporter on main;
  2. the error is resolved by default after applying this PR;
  3. the error reappears when enabling the feature gate (this is expected)
  4. scope attributes are added on the traces and logs no matter the state of the gate.

@jade-guiton-dd jade-guiton-dd requested a review from a team as a code owner April 28, 2025 16:00
@jade-guiton-dd jade-guiton-dd requested a review from dmitryax April 28, 2025 16:00
@jade-guiton-dd jade-guiton-dd added collector-telemetry healthchecker and other telemetry collection issues release:blocker The issue must be resolved before cutting the next release labels Apr 28, 2025
@songy23 songy23 added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Apr 28, 2025
@songy23
Copy link
Member

songy23 commented Apr 28, 2025

the error reappears when enabling the feature gate (this is expected)

It may be worth adding a changelog about this too

@songy23 songy23 removed the Skip Changelog PRs that do not require a CHANGELOG.md entry label Apr 28, 2025
@jade-guiton-dd
Copy link
Contributor Author

You're right, I'll add a warning in the release note.

@songy23 songy23 mentioned this pull request Apr 28, 2025
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.66%. Comparing base (f26c6bb) to head (9241f2f).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #12933   +/-   ##
=======================================
  Coverage   91.66%   91.66%           
=======================================
  Files         499      499           
  Lines       27425    27425           
=======================================
  Hits        25140    25140           
  Misses       1806     1806           
  Partials      479      479           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jade-guiton-dd jade-guiton-dd added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Apr 28, 2025
@jade-guiton-dd
Copy link
Contributor Author

jade-guiton-dd commented Apr 28, 2025

(Skipping the changelog check since I'm modifying the release note for #12856 instead of making a new one to avoid confusion)

Copy link
Contributor

@jackgopack4 jackgopack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, thanks for identifying the problem and proposing a fix

@bogdandrutu bogdandrutu added this pull request to the merge queue Apr 28, 2025
Merged via the queue into open-telemetry:main with commit 8862cb1 Apr 28, 2025
59 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collector-telemetry healthchecker and other telemetry collection issues release:blocker The issue must be resolved before cutting the next release Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants