Skip to content

Missing attributes in internal Collector logs #12870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jade-guiton-dd opened this issue Apr 16, 2025 · 3 comments
Open

Missing attributes in internal Collector logs #12870

jade-guiton-dd opened this issue Apr 16, 2025 · 3 comments
Labels
area:service bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues

Comments

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Apr 16, 2025

Description

In versions 0.123.0 and 0.124.0 of the Collector, there has been a regression in the attributes included in internal Collector logs. Specifically, the otelcol. attributes defined in the Pipeline Component Telemetry RFC, which have been included in said logs since version 0.120.0 of the Collector, are now missing by default.

Reproduction

This can be reproduced by starting a Collector at version 0.122.1 with a debug exporter in one of the pipelines.

You should see a log like this:

2025-04-16T11:35:48.035+0200    info    builders/builders.go:26 Development component. May change in the future.        {"otelcol.component.id": "debug", "otelcol.component.kind": "Exporter", "otelcol.signal": "logs"}

By contrast, in versions 0.123.0 or 0.124.0, with no feature gates set, you will see the following:

2025-04-16T11:36:08.767+0200    info    builders/builders.go:26 Development component. May change in the future.

In this case, information is missing to identify which component is responsible for the log.

The attributes are missing no matter how the logs are emitted; both in standard error output like above, and when exporting the Collector's logs using service::telemetry::logs::processors, as described in the documentation.

Cause

This regression was introduced by PR #12617, whose primary goal was to extend these component attributes to internal metrics and traces.

In cases where logs are exported through service::telemetry::logs::processors, it also switched the component attributes in internal logs from datapoint attributes to instrumentation scope attributes, to fit with the other two signals and reduce data redundancy. This is a breaking change, but note that at this time, we make no stability guarantees on the format of the Collector's internal logs, so please avoid relying on them.

The issue is that the changes in this PR were hastily put behind an alpha (off by default) feature gate, telemetry.newPipelineTelemetry. The result was that the "off" behavior was to omit the component attributes altogether in all circumstances.

Workarounds

  • If you are gathering the Collector's logs from terminal output / standard error:

    Enabling the telemetry.newPipelineTelemetry feature gate should restore the missing attributes to the log output. This can be done by running the Collector with the --feature-gates telemetry.newPipelineTelemetry command-line argument.

  • If you are exporting the Collector's logs through OTLP using service::telemetry::logs::processors:

    • If you export to an endpoint which supports instrumentation scope attributes:

      Enabling the feature gate will restore the missing attributes, but as instrumentation scope attributes instead of standard log attributes.
      Please check with your observability vendor on whether they support ingestion of instrumentation scope attributes.

    • Otherwise:

      We unfortunately do not have a good workaround for this case. Here are some options:

      • Downgrading the Collector to version 0.122.1;
      • Switching to a different log gathering method, for example collecting logs from standard error, which is a feature of many observability vendors;
      • For some use cases, the file and line of code (code.filepath, code.function, and code.lineno attributes) which are still present on the logs may be sufficient to identify which component a log originates from.

      Note that the behavior behind the feature gate, ie. using instrumentation scope attributes in internal logs to identify components, will eventually become the default, so exporting to an endpoint with no support for scope attributes will become problematic.

@jade-guiton-dd jade-guiton-dd added area:service bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues labels Apr 16, 2025
@jade-guiton-dd
Copy link
Contributor Author

jade-guiton-dd commented Apr 16, 2025

Because of this problematic "off" behavior of the feature gate, @djaglowski and I plan to remove it, and the changes to internal telemetry will be enabled permanently; see PR #12856.

@sfc-gh-bdrutu
Copy link

The attributes are also missing when exporting the Collector's logs using service::telemetry::logs::processors, as described in the documentation.

Please rephrase this to: attributes are missing all the time.

Downgrading the Collector to version 0.122.1, or upgrading it once the regression has been fixed;

Upgrading to what?

@jade-guiton-dd
Copy link
Contributor Author

jade-guiton-dd commented Apr 16, 2025

Please rephrase this to: attributes are missing all the time.

I tried to make it clearer that the attributes are missing no matter how they are emitted.

Upgrading to what?

I removed the mention of upgrading, since it seems to have been decided that we will move forward with enabling the feature by default, which will not help with the "export through otlp but no scope attribute support" case.

github-merge-queue bot pushed a commit that referenced this issue Apr 28, 2025
…behind feature gate (#12933)

#### Context

PR #12617 introduced logic to inject new instrumentation scope
attributes in all internal telemetry to identify which Collector
component it came from. These attributes had already been added to
internal logs as regular log attributes, and this PR switched them to
scope attributes for consistency. The new logic was placed behind an
Alpha stage feature gate, `telemetry.newPipelineTelemetry`.

Unfortunately, the default "off" state of the feature gate disabled the
injection of component-identifying attributes entirely, which was a
regression since they had been present in internal logs in previous
releases. See issue #12870 for an in-depth discussion of this issue.

To correct this, PR #12856 was filed, which stabilized the feature gate,
making it on by default, with no way to disable it, and removed the
logic that the feature gate used to toggle. This was thought to be the
simplest way to mitigate the regression in the "off" state, since we
planned to stabilize the feature eventually anyways.

Unfortunately, it was found that the "on" state of the feature gate
causes a different issue: [the Prometheus
exporter](https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus)
is the default way of exporting the Collector's internal metrics,
accessible at `collector:8888/metrics`. This exporter does not currently
have any support for instrumentation scope attributes, meaning that
metric streams differentiated by said attributes but not by any other
identifying property will appear as aliases to Prometheus, which causes
an error. This completely breaks the export of Collector metrics through
Prometheus under some simple configurations, which is a release blocker.

#### Description

To fix this issue, this PR sets the `telemetry.newPipelineTelemetry`
feature gate back to "Alpha" (off by default), and reintroduces logic to
disable the injection of the new instrumentation scope attributes when
the gate is off, but only in internal metrics. Note that the new logic
is still used unconditionally for logs and traces, to avoid
reintroducing the logs issue (#12870).

This should avoid breaking the Collector in its default configuration
while we try to get a fix in the Prometheus exporter.

#### Link to tracking issue
No tracking issue currently, will probably file one later.

#### Testing

I performed some simple manual testing with a config file like the
following:

```yaml
receivers:
  otlp: [...]
processors:
  batch:
exporters:
  debug: [...]
service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
  telemetry:
    metrics:
      level: detailed    
    traces: [...]
    logs: [...]
```

The two batch processors create aliased metric streams, which are only
differentiated by the new component attributes. I checked that:
1. this config causes an error in the Prometheus exporter on main;
2. the error is resolved by default after applying this PR;
3. the error reappears when enabling the feature gate (this is expected)
4. scope attributes are added on the traces and logs no matter the state
of the gate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:service bug Something isn't working collector-telemetry healthchecker and other telemetry collection issues
Projects
None yet
Development

No branches or pull requests

2 participants