-
Notifications
You must be signed in to change notification settings - Fork 2k
Add OTEL support (carry 4556) #4698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
vendor.mod
Outdated
require ( | ||
github.com/grpc-ecosystem/grpc-gateway/v2 v2.11.3 // indirect | ||
go.opentelemetry.io/otel/exporters/otlp/internal/retry v1.14.0 // indirect | ||
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 // indirect | ||
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.14.0 // indirect | ||
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 // indirect | ||
go.opentelemetry.io/otel/sdk v1.14.0 // indirect | ||
go.opentelemetry.io/proto/otlp v0.19.0 // indirect | ||
google.golang.org/genproto v0.0.0-20230711160842-782d3b101e98 // indirect | ||
google.golang.org/genproto/googleapis/api v0.0.0-20230711160842-782d3b101e98 // indirect | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to double-check versions of these new ones
cmd/docker/docker.go
Outdated
@@ -15,10 +16,14 @@ import ( | |||
"github.com/docker/cli/cli/version" | |||
"github.com/docker/cli/cmd/docker/internal/appcontext" | |||
"github.com/docker/docker/api/types/versions" | |||
"github.com/moby/buildkit/util/tracing/detect" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- "github.com/moby/buildkit/util/tracing/detect"
- "github.com/moby/buildkit/util/bklog"
the bklog package is used for;
detect.TracerProvider()
https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L123-L125- ->
detect()
https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L97-L98 - ->
bklog. EnableLogWithTraceID()
https://github.com/moby/buildkit/blob/v0.12.4/util/bklog/log.go#L23-L29
For the detect
package; wondering if we can move that separate; that would (potentially) also allow for the detect package to have a matching version of OTEL, most notalbly semconv
; https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L15
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
Not sure what the best solution is for the bklog
package; the function we're calling sets a non-exported package variable. That variable is used in GetLogger
(as an optimisation?) to return a logger with traces; https://github.com/moby/buildkit/blob/v0.12.4/util/bklog/log.go#L41-L64
func GetLogger(ctx context.Context) (l *logrus.Entry) {
logger := ctx.Value(loggerKey{})
if logger != nil {
l = logger.(*logrus.Entry)
} else if logger := log.GetLogger(ctx); logger != nil {
l = logger
} else {
l = L
}
if logWithTraceID {
if spanContext := trace.SpanFromContext(ctx).SpanContext(); spanContext.IsValid() {
return l.WithFields(logrus.Fields{
"traceID": spanContext.TraceID(),
"spanID": spanContext.SpanID(),
})
}
}
return l
}
❓ 🤔 perhaps flip the logic, and move that to the tracing/detect
package;
bklog.GetLogger()
-> tracing/detect.TraceLoggingEnabled()
I'll move the first commit (punching-through contexts) to a separate PR |
77f2f0f
to
85e51f0
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #4698 +/- ##
==========================================
- Coverage 61.31% 0 -61.32%
==========================================
Files 287 0 -287
Lines 20065 0 -20065
==========================================
- Hits 12303 0 -12303
+ Misses 6869 0 -6869
+ Partials 893 0 -893 |
a6fc3d8
to
e615d3e
Compare
cli/command/cli.go
Outdated
// WithContext sets the base context for the cli | ||
// This is used internally for operations may require a context to propagate tracing. | ||
func (cli *DockerCli) WithContext(ctx context.Context) { | ||
cli.baseCtx = ctx | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a commit that re-implements this as a CLIOpt
Thanks to @milas ❤️ I pushed a commit that swaps BuildKit detect with the OTEL SDK 🎉 |
I'll cleanup / squash commits a bit after review; just pushing the changes here first |
name := cmd.Name() | ||
for p := cmd.Parent(); p != nil && p != cmd.Root(); p = p.Parent() { | ||
name = p.Name() + " " + name | ||
} | ||
|
||
ctx, _ := otel.Tracer("").Start(cmd.Context(), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just discussing with @milas that we don't have a good convention yet for passing the command name; compose currently uses this;
ctx, cmdSpan := tracing.Tracer.Start(
ctx,
"cli/"+strings.Join(commandName(cmd), "-"),
)
cmdSpan.SetAttributes(attribute.StringSlice("cli.args", args))
cmdSpan.SetAttributes(attribute.StringSlice("cli.flags", getFlags(cmd.Flags())))
that's what Compose is doing so it's slightly different, e.g. docker compose alpha publish
-> cli/alpha-publish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep this as-is for now, and look at improving later; I could add the flags and args if we think it's useful to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://opentelemetry.io/docs/specs/semconv/attributes-registry/process/
OH! That looks great, and from the looks of it that pretty much describes the things we'd want to include (and we wouldn't have to come up with our own conventions, which is a big 💯)
@milas @jsternberg You'll probably be interested in this (if you didn't know about these).
From the looks of it; most of these we could set using a generic helper (if none exists for it). We MAY have to consider if we should sanitise arguments though (i.e. do we want --env MYSQL_ROOT_PASSWORD=<pass>
(or --build-arg
) to end up in traces? 😬)
If we decide to sanitise, perhaps we need an option to opt-out of sanitising (for those that DO want to include all of that in their traces).
Let me add a screenshot to save others from having to follow the link, and to add context to my comment;

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but you will need to add a call to tp.Shutdown()
to ensure the spans get flushed out when the process exits
See:
cmd/docker/docker.go
Outdated
// create a default resource but specify the service name explicitly, or | ||
// it will be `unknown_service:<name_of_exe>` | ||
// | ||
// N.B. a schemaless resource is used with the STABLE semconv value of | ||
// `service.name` to avoid https://github.com/open-telemetry/opentelemetry-go/issues/2341 | ||
res, err := resource.Merge( | ||
resource.Default(), | ||
resource.NewSchemaless( | ||
attribute.String("service.name", "docker-cli"), | ||
), | ||
) | ||
if err != nil { | ||
return fmt.Errorf("merging resource: %w", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized this is actually redundant because we do os.SetEnv("OTEL_SERVICE_NAME", cmd.Name())
if it's not set
You can just res := resource.Default()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice! Just pushed a commit (to be squash) with what I think you meant 👍
![]() Using my branch off your PR 🎉 (also notice I snuck in some new attributes for the CLI version metadata 👀) Start a Jaeger instance:
(adapted from https://www.jaegertracing.io/docs/latest/getting-started/#all-in-one) Open web UI @ http://localhost:16686/ make -f docker.Makefile build
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 ./build/docker --debug version |
cmd/docker/docker.go
Outdated
@@ -58,6 +82,10 @@ func newDockerCommand(dockerCli *command.DockerCli) *cli.TopLevelCommand { | |||
cmd.SetOut(dockerCli.Out()) | |||
cmd.SetErr(dockerCli.Err()) | |||
|
|||
// Cobra's context may be nil in some cases, so initialize it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cobra's context should never be nil.
A context should never be nil, ever.
Did something change upstream in cobra?
I remember specifically discussing this when contexts were added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! This was from your original PR, but we were discussing this as well in #4599 (comment)
I can remove this part 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/docker/docker.go
Outdated
// This is the same as on the dockerd side. | ||
// This can be removed after buildkit's detect package is updated. | ||
if os.Getenv("OTEL_EXPORTER_OTLP_TRACES_PROTOCOL") == "" && os.Getenv("OTEL_EXPORTER_OTLP_PROTOCOL") == "" { | ||
os.Setenv("OTEL_EXPORTER_OTLP_TRACES_PROTOCOL", "http/protobuf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably is not needed since we aren't using detect anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should go away - I verified things are still good without it
(@thaJeztah I deleted it over in https://github.com/thaJeztah/cli/pull/4/files#diff-3dcd6325b70a5f5db393a3ecc08060b69007fcef371fb4dabf3164da95646fb6L305-L312 )
cmd/docker/docker.go
Outdated
@@ -357,6 +353,39 @@ func runDocker(dockerCli *command.DockerCli) error { | |||
return cmd.Execute() | |||
} | |||
|
|||
func initializeTracing() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pass the context here that was set on the command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a context.Context
available, it's probably worth passing through.
That said...I stepped through the source - neither otlphttp
or otlpgrpc
actually use it 🙃
cmd/docker/docker.go
Outdated
cmd.SetContext(ctx) | ||
_ = dockerCli.Apply(command.WithContext(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs more spans. And I suggest splitting up the instrumentation and propagation work (with the OTEL API go.opentelemetry.io/otel/...
) into separate commits from the exporter work (with the OTEL SDK go.opentelemetry.io/otel/sdk/...
).
@@ -123,6 +125,8 @@ func RunStats(ctx context.Context, dockerCLI command.Cli, options *StatsOptions) | |||
|
|||
showAll := len(options.Containers) == 0 | |||
if showAll { | |||
span := trace.SpanFromContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When in doubt, create a nested span. It provides more detailed tracing insights than reusing a single top-level span for the entire lifetime of the process.
Thanks all! I pushed a couple of commits (to be squashed if ok) I'll be calling it a day for now, but if someone wants to push to this branch; feel free to. Pushing additional commits should be fine; we can cleanup / squash later (I didn't bother cleaning up commits yet while we're moving this still) |
@@ -309,6 +348,22 @@ func runDocker(dockerCli *command.DockerCli) error { | |||
return cmd.Execute() | |||
} | |||
|
|||
func initializeTracing(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a context here
os.Setenv("OTEL_SERVICE_NAME", cmd.Root().Name()) | ||
} | ||
|
||
if err := initializeTracing(cmd.Context()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And passed cmd.Context()
here, but double check if that's indeed the correct thing to do, or if it was deliberately a separate context 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs on it are minimal - it appears to be intended for exporter setup, e.g. if it needs to dial a connection up-front or something I guess? [see my other comment though, it's actually unused currently by both the HTTP/gRPC exporters]
HA!
|
// TODO: There doesn't seem to be a way to determine if the command returned an error, so we can set the span status here. | ||
trace.SpanFromContext(cmd.Context()).End() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're fighting an uphill trying to use cobra lifecycle hooks for this. See below.
os.Setenv("OTEL_SERVICE_NAME", cmd.Root().Name()) | ||
} | ||
|
||
if err := initializeTracing(cmd.Context()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize tracing in main()
. Here's too late to be able to export traces from early in the process lifecycle.
otel.SetTracerProvider(sdktrace.NewTracerProvider( | ||
sdktrace.WithSpanProcessor(sdktrace.NewBatchSpanProcessor(exporter)), | ||
sdktrace.WithResource(resource.Default()), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tracer provider needs to be shut down before the process exits to ensure the recorded spans are flushed to the exporter.
@@ -309,6 +348,22 @@ func runDocker(dockerCli *command.DockerCli) error { | |||
return cmd.Execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cmd.Execute() | |
ctx, span := otel.Tracer("github.com/docker/docker/cli/cmd/docker").Start( | |
cmd.Context(), | |
"runDocker", | |
) | |
defer span.End() | |
err = cmd.ExecuteContext(ctx) | |
if err != nil { | |
span.SetStatus(codes.Error, "command errored") | |
span.RecordError(err) | |
} | |
return err |
It'd be even better to start the span at the top of the function, so the latency before the command is executed gets captured in the traces.
7eba434
to
d0da194
Compare
- Add otel tracing support - switch to use OTEL sdk instead of buildkit - use default resource We already set os.SetEnv("OTEL_SERVICE_NAME", cmd.Name()) if it's not set, so don't have to do the merge here. - pass the cmd.Context() - Cobra's context should never be nil Co-authored-by: Milas Bowman <[email protected]> Co-authored-by: Sebastiaan van Stijn <[email protected]> Signed-off-by: Brian Goff <[email protected]>
d0da194
to
f8850d6
Compare
see docker#4698 (comment) Signed-off-by: Sebastiaan van Stijn <[email protected]>
This is broken into 2 commits:
- What I did
- How I did it
- How to verify it
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)