[cmd/opampsupervisor] Supervisor reports last collector STDERR message #39954

dpaasman00 · 2025-05-08T18:38:36Z

Description

If the supervisor receives a "bad" remote config (collector is unable to start or fails shortly after) and starts the collector with it, the supervisor reports a "Failed" RemoteConfigStatus and an error. This error is usually either "Config apply timeout exceeded" or "Agent process PID=1234 exited unexpectedly, exit code=1. Will restart in a bit...".

This error isn't very descriptive though as to why the collector failed and requires retrieving the collector's log to determine the root issue. In situations where these logs aren't accessible it makes debugging very difficult if not impossible.

This PR changes how the collector process is ran so that we can keep track of the last message the collector writes to STDERR. Whenever the collector process fails, we include this last error message with the supervisor's description of the issue.

For example, if the failure is an unrecognized component in the config, this is the error reported to the OpAMP server:

"Config apply timeout exceeded: \nerror decoding 'exporters': unknown type: \"doesntexist\" for id: \"doesntexist\" (valid values: [file opensearch rabbitmq sapm signalfx splunk_hec nop alertmanager alibabacloud_logservice datadog elasticsearch googlecloud googlecloudpubsub sumologic azureblob influxdb sentry syslog zipkin otlphttp dataset stef debug awss3 awsxray azuredataexplorer honeycombmarker kafka logzio opencensus awscloudwatchlogs awsemf azuremonitor bmchelix loki mezmo prometheus pulsar carbon clickhouse tencentcloud_logservice otlp awskinesis doris googlemanagedprometheus loadbalancing logicmonitor otelarrow prometheusremotewrite cassandra coralogix])"

Testing

E2E test for restarting after a bad config is updated to check for an error message.

Documentation

iblancasa · 2025-06-17T16:06:44Z

cmd/opampsupervisor/supervisor/commander/commander.go

@@ -79,11 +83,20 @@ func (c *Commander) Start(ctx context.Context) error {
 	c.cmd.Env = common.EnvVarMapToEnvMapSlice(c.cfg.Env)
 	c.cmd.SysProcAttr = sysProcAttrs()

-	// PassthroughLogging changes how collector start up happens
+	// grab cmd pipes


I think it would be nice to have a comment about why do we need to do this

dpaasman00 requested review from evan-bradley, atoulme, tigrannajaryan and a team as code owners May 8, 2025 18:38

github-actions bot assigned codeboten May 8, 2025

github-actions bot added the cmd/opampsupervisor label May 8, 2025

atoulme added the waiting-for-code-owners label May 12, 2025

github-actions bot mentioned this pull request May 13, 2025

Weekly Report: 2025-05-06 - 2025-05-13 #40023

Closed

dpaasman00 marked this pull request as draft May 14, 2025 12:14

github-actions bot mentioned this pull request May 20, 2025

Weekly Report: 2025-05-13 - 2025-05-20 #40138

Closed

evan-bradley mentioned this pull request May 20, 2025

[cmd/opampsupervisor] Add exit reason to logs when the OTEL Collector process finish #40174

Open

evan-bradley added waiting for author and removed waiting-for-code-owners labels May 20, 2025

dpaasman00 force-pushed the supervisor-reports-last-collector-stderr branch from 3b19a58 to 4e85493 Compare May 27, 2025 11:52

dpaasman00 added 4 commits June 4, 2025 11:35

report collectors last err from stderr

5bb6d56

update e2e test

b455437

chlog

3b7db9d

cleanup scanner errors & log flushing

6e51851

dpaasman00 force-pushed the supervisor-reports-last-collector-stderr branch from 8df8cc6 to 6e51851 Compare June 4, 2025 15:35

dpaasman00 marked this pull request as ready for review June 4, 2025 15:35

github-actions bot assigned dmitryax Jun 4, 2025

iblancasa approved these changes Jun 17, 2025

View reviewed changes

iblancasa added waiting-for-code-owners and removed waiting for author labels Jun 26, 2025

github-actions bot mentioned this pull request Jul 1, 2025

Weekly Report: 2025-06-24 - 2025-07-01 #41008

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cmd/opampsupervisor] Supervisor reports last collector STDERR message #39954

[cmd/opampsupervisor] Supervisor reports last collector STDERR message #39954

Uh oh!

dpaasman00 commented May 8, 2025

Uh oh!

iblancasa Jun 17, 2025

Uh oh!

Uh oh!

[cmd/opampsupervisor] Supervisor reports last collector STDERR message #39954

Are you sure you want to change the base?

[cmd/opampsupervisor] Supervisor reports last collector STDERR message #39954

Uh oh!

Conversation

dpaasman00 commented May 8, 2025

Description

Testing

Documentation

Uh oh!

iblancasa Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!