Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using new otlpfile pipeline for self metrics #1922

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

rafaelwestphal
Copy link
Contributor

Description

Changing implementation of how self_metrics are reported.
This makes use of the enabled_receivers.json and feature_track.json files along with the otlpjsonfile receiver to push metrics instead of the using the diagnostics service.

Complete removal of the diagnostics service will be done in a coming PR.

Related issue

b/396458573

How has this been tested?

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

@XuechunHou XuechunHou self-requested a review April 1, 2025 21:33
// Wait until the otlpjsonfile receiver writes the metrics with the
// correct labels.
time.Sleep(2 * time.Minute)
testDefaultMetrics(ctx, t, logger, vm, time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions :

  • Why not sleeping ~1min instead ?
  • Why do we update this window parameter to time.Minute? I think this would make the query more restrictive to find default metric datapoints and cause the test to be slower and potentially flaky.

@@ -100,6 +100,12 @@ func (uc *UnifiedConfig) GenerateOtelConfig(ctx context.Context) (string, error)
ReceiverPipelineName: "otel",
}

receiverPipelines["enabled_receivers_feature_tracking"] = EnabledReceiversFeatureTrackingMetricsPipeline(ctx, outDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a more descriptive name for the pipeline instead of enabled_receivers_feature_tracking. A suggestion is otlp_agent_self_metrics.

Another alternative is to call the pipeline ops_agent since these metrics are Ops Agent self metrics. This following the naming of the others otel and fluentbit which represent self metrics about subagents.

Copy link
Contributor

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall - thanks for driving this!

)

var (
config = flag.String("config", "/etc/google-cloud-ops-agent/config.yaml", "path to the user specified agent config")
)

func run(ctx context.Context) error {
userUc, mergedUc, err := utils.GetUserAndMergedConfigs(ctx, *config)
_, _, err := utils.GetUserAndMergedConfigs(ctx, *config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can just return an err now. We don't ever use the first 2 return values I don't think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all will go away on the next PR that removes the diagnostics binary. I'm trying to change the least i can and maintain functionality to make it easier to review.

if err != nil {
return err
}

ctx, cancel := context.WithCancel(ctx)
_, cancel := context.WithCancel(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all will go away on the next PR that removes the diagnostics binary. I'm trying to change the least i can and maintain functionality to make it easier to review.

@@ -59,7 +59,7 @@ func (uc *UnifiedConfig) GenerateFilesFromConfig(ctx context.Context, service, l
}
}
case "otel":
otelConfig, err := uc.GenerateOtelConfig(ctx)
otelConfig, err := uc.GenerateOtelConfig(ctx, outDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] In the fluent bit case, the Generate Command just returns the files and then we write in this function. For OTel though, it looks like we write the file in side this call. Lets make the behaviour consistent between the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do on a follow up

Copy link
Contributor

@franciscovalentecastro franciscovalentecastro Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't write the otel generated configs inside GenerateOtelConfig, they are written in here the same way as the fluent-bit case (see the next 10 lines).

In the PR, @rafaelwestphal is passing outDir to know where did the *_oltp.json files were written too.

Copy link
Contributor

@franciscovalentecastro franciscovalentecastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants