Skip to content

[PIPE-1021] Propagate parent OTel trace/span from backend if provided #3348

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 20, 2025

Conversation

catkins
Copy link
Contributor

@catkins catkins commented Jun 20, 2025

Description

As part of the OpenTelemetry Tracing Notification Service work, we provide the traceparent from the job acquire / accept endpoints. When present, and the user has opted-in, we set the root span for the job to use the provided trace/parent span id.

This is based on the W3C Tracecontext spec.

Context

PIPE-1021

Changes

Adds --tracing-accept-traceparent flag, that if provided will allow accepting opentelemetry tracing parent span information from the Buildkite control plane.

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go fmt ./...)

@catkins catkins marked this pull request as draft June 20, 2025 00:39
@catkins catkins force-pushed the catkins/PIPE-1021/accept-traceparent-from-control-plane branch from c81b4ba to 71d2821 Compare June 20, 2025 01:05
@catkins catkins marked this pull request as ready for review June 20, 2025 01:40
Copy link
Contributor

@wolfeidau wolfeidau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small things/questions.

@@ -617,6 +617,13 @@ func (r *JobRunner) createEnvironment(ctx context.Context) ([]string, error) {
if r.conf.AgentConfiguration.TracingBackend != "" {
env["BUILDKITE_TRACING_BACKEND"] = r.conf.AgentConfiguration.TracingBackend
env["BUILDKITE_TRACING_SERVICE_NAME"] = r.conf.AgentConfiguration.TracingServiceName

if r.conf.Job.TraceParent != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment to explain what one mite find in this value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 1b6afee

TracingServiceName string `cli:"tracing-service-name"`
TracingBackend string `cli:"tracing-backend"`
TracingServiceName string `cli:"tracing-service-name"`
TracingAcceptTraceparent bool `cli:"tracing-accept-traceparent"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ther a reason why you wouldn't want to propegate this value? Is propegate a better word?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for making it opt-in is in the comment in internal/job/tracing.go you commented on. Propagate is a better word 🧠 I'll change accept => propagate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the terminology to propagate in d6721c9

@@ -183,6 +186,69 @@ func (e *Executor) startTracingOpenTelemetry(ctx context.Context) (tracetools.Sp
return tracetools.NewOpenTelemetrySpan(span), ctx, stop
}

// accepting traceparent from Buildkite control plane is an opt-in feature as its technically
// a breaking change to the behaviour, and if the server-side tracing isn't set up
// correctly, agent traces may end up without root spans to link to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see now, is this actualy the case though based on your code it looks like it skips this issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly the breaking change thing in behavior. In some tools like Grafana it looks very clunky if you're missing the root span from the trace.

The backend will add a traceparent property in the agent jobs accept / acquire APIs (behind a FF currently) but may not have the opentelemetry tracing notification service set up, so we only want to propagate it when they opt-in for now.


// parse W3C trace context traceparent header format
// https://www.w3.org/TR/trace-context/#traceparent-header
func extractTraceParent(traceParent string) (trace.TraceID, trace.SpanID, trace.TraceFlags, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just use the inbuild parser?

        otel.GetTextMapPropagator().Inject(r.Context(), propagation.HeaderCarrier(w.Header()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh I was trying to find an implementation of that in the otel sdk but was obviously looking in the wrong spots! I'll have a dig and get rid of this handcrafted logic!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sorted in 4963913

  • I needed use Extract rather than Inject, to decorate the context.Context
  • I couldn't use headers, because (a) its not passed as a header from the backend and (b) the whole buildkite-agent bootstrap dance that passes off the parameters and config to the child process, so I just used a propagation.MapCarrier and stuffed the value where I needed it.

Refs:

@catkins
Copy link
Contributor Author

catkins commented Jun 20, 2025

With the --tracing-propagate-traceparent flag provided, and otel backend pointing to grafana

CleanShot 2025-06-20 at 14 45 58@2x

With otel backend pointing to grafana but without the flag: Maintains the current behaviour where the root span is the job in the buildkite-agent service.

CleanShot 2025-06-20 at 14 46 06@2x

@catkins catkins requested a review from wolfeidau June 20, 2025 05:27
Copy link
Contributor

@wolfeidau wolfeidau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏻 from me, be great to try this out 🚀 📈

@catkins catkins merged commit 8792019 into main Jun 20, 2025
1 check passed
@catkins catkins deleted the catkins/PIPE-1021/accept-traceparent-from-control-plane branch June 20, 2025 07:00
@catkins catkins mentioned this pull request Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants