Skip to content

Add OTEL support (carry 4556) #4698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Dec 6, 2023

This is broken into 2 commits:

  • Plumb contexts through commands
  • Add otel tracing as a persistent pre-run to all CLI commands.

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah thaJeztah changed the title Add otel support (carry 4556) Add OTEL support (carry 4556) Dec 6, 2023
vendor.mod Outdated
Comment on lines 95 to 105
require (
github.com/grpc-ecosystem/grpc-gateway/v2 v2.11.3 // indirect
go.opentelemetry.io/otel/exporters/otlp/internal/retry v1.14.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.14.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.14.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.14.0 // indirect
go.opentelemetry.io/otel/sdk v1.14.0 // indirect
go.opentelemetry.io/proto/otlp v0.19.0 // indirect
google.golang.org/genproto v0.0.0-20230711160842-782d3b101e98 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20230711160842-782d3b101e98 // indirect
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to double-check versions of these new ones

@@ -15,10 +16,14 @@ import (
"github.com/docker/cli/cli/version"
"github.com/docker/cli/cmd/docker/internal/appcontext"
"github.com/docker/docker/api/types/versions"
"github.com/moby/buildkit/util/tracing/detect"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This is the new dependency on BuildKit; this brings in;

  • "github.com/moby/buildkit/util/tracing/detect"
  • "github.com/moby/buildkit/util/bklog"

the bklog package is used for;

For the detect package; wondering if we can move that separate; that would (potentially) also allow for the detect package to have a matching version of OTEL, most notalbly semconv; https://github.com/moby/buildkit/blob/v0.12.4/util/tracing/detect/detect.go#L15

	semconv "go.opentelemetry.io/otel/semconv/v1.17.0"

Not sure what the best solution is for the bklog package; the function we're calling sets a non-exported package variable. That variable is used in GetLogger (as an optimisation?) to return a logger with traces; https://github.com/moby/buildkit/blob/v0.12.4/util/bklog/log.go#L41-L64

func GetLogger(ctx context.Context) (l *logrus.Entry) {
	logger := ctx.Value(loggerKey{})

	if logger != nil {
		l = logger.(*logrus.Entry)
	} else if logger := log.GetLogger(ctx); logger != nil {
		l = logger
	} else {
		l = L
	}

	if logWithTraceID {
		if spanContext := trace.SpanFromContext(ctx).SpanContext(); spanContext.IsValid() {
			return l.WithFields(logrus.Fields{
				"traceID": spanContext.TraceID(),
				"spanID":  spanContext.SpanID(),
			})
		}
	}

	return l
}

❓ 🤔 perhaps flip the logic, and move that to the tracing/detect package;

bklog.GetLogger() -> tracing/detect.TraceLoggingEnabled()

@thaJeztah
Copy link
Member Author

I'll move the first commit (punching-through contexts) to a separate PR

@codecov-commenter
Copy link

codecov-commenter commented Dec 12, 2023

Codecov Report

Merging #4698 (d0da194) into master (efb9206) will decrease coverage by 61.32%.
Report is 44 commits behind head on master.
The diff coverage is n/a.

❗ Current head d0da194 differs from pull request most recent head 827bead. Consider uploading reports for the commit 827bead to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #4698       +/-   ##
==========================================
- Coverage   61.31%       0   -61.32%     
==========================================
  Files         287       0      -287     
  Lines       20065       0    -20065     
==========================================
- Hits        12303       0    -12303     
+ Misses       6869       0     -6869     
+ Partials      893       0      -893     

@thaJeztah thaJeztah force-pushed the carry_otel branch 3 times, most recently from a6fc3d8 to e615d3e Compare December 14, 2023 10:06
Comment on lines 95 to 100
// WithContext sets the base context for the cli
// This is used internally for operations may require a context to propagate tracing.
func (cli *DockerCli) WithContext(ctx context.Context) {
cli.baseCtx = ctx
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a commit that re-implements this as a CLIOpt

@thaJeztah
Copy link
Member Author

Thanks to @milas ❤️ I pushed a commit that swaps BuildKit detect with the OTEL SDK 🎉

@thaJeztah
Copy link
Member Author

I'll cleanup / squash commits a bit after review; just pushing the changes here first

Comment on lines +59 to +87
name := cmd.Name()
for p := cmd.Parent(); p != nil && p != cmd.Root(); p = p.Parent() {
name = p.Name() + " " + name
}

ctx, _ := otel.Tracer("").Start(cmd.Context(), name)
Copy link
Member Author

@thaJeztah thaJeztah Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just discussing with @milas that we don't have a good convention yet for passing the command name; compose currently uses this;

ctx, cmdSpan := tracing.Tracer.Start(
	ctx,
	"cli/"+strings.Join(commandName(cmd), "-"),
)
cmdSpan.SetAttributes(attribute.StringSlice("cli.args", args))
cmdSpan.SetAttributes(attribute.StringSlice("cli.flags", getFlags(cmd.Flags())))

that's what Compose is doing so it's slightly different, e.g. docker compose alpha publish -> cli/alpha-publish

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this as-is for now, and look at improving later; I could add the flags and args if we think it's useful to add.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://opentelemetry.io/docs/specs/semconv/attributes-registry/process/

OH! That looks great, and from the looks of it that pretty much describes the things we'd want to include (and we wouldn't have to come up with our own conventions, which is a big 💯)

@milas @jsternberg You'll probably be interested in this (if you didn't know about these).

From the looks of it; most of these we could set using a generic helper (if none exists for it). We MAY have to consider if we should sanitise arguments though (i.e. do we want --env MYSQL_ROOT_PASSWORD=<pass> (or --build-arg) to end up in traces? 😬)

If we decide to sanitise, perhaps we need an option to opt-out of sanitising (for those that DO want to include all of that in their traces).

Let me add a screenshot to save others from having to follow the link, and to add context to my comment;

Screenshot 2024-01-05 at 12 28 37

Copy link

@milas milas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but you will need to add a call to tp.Shutdown() to ensure the spans get flushed out when the process exits

See:

Comment on lines 365 to 378
// create a default resource but specify the service name explicitly, or
// it will be `unknown_service:<name_of_exe>`
//
// N.B. a schemaless resource is used with the STABLE semconv value of
// `service.name` to avoid https://github.com/open-telemetry/opentelemetry-go/issues/2341
res, err := resource.Merge(
resource.Default(),
resource.NewSchemaless(
attribute.String("service.name", "docker-cli"),
),
)
if err != nil {
return fmt.Errorf("merging resource: %w", err)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this is actually redundant because we do os.SetEnv("OTEL_SERVICE_NAME", cmd.Name()) if it's not set

You can just res := resource.Default()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice! Just pushed a commit (to be squash) with what I think you meant 👍

@milas
Copy link

milas commented Jan 4, 2024

Screenshot 2024-01-04 at 5 22 20 PM

Using my branch off your PR 🎉

(also notice I snuck in some new attributes for the CLI version metadata 👀)

Start a Jaeger instance:

docker run --rm --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:1.52

(adapted from https://www.jaegertracing.io/docs/latest/getting-started/#all-in-one)

Open web UI @ http://localhost:16686/

make -f docker.Makefile build
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 ./build/docker --debug version

@@ -58,6 +82,10 @@ func newDockerCommand(dockerCli *command.DockerCli) *cli.TopLevelCommand {
cmd.SetOut(dockerCli.Out())
cmd.SetErr(dockerCli.Err())

// Cobra's context may be nil in some cases, so initialize it here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cobra's context should never be nil.
A context should never be nil, ever.
Did something change upstream in cobra?
I remember specifically discussing this when contexts were added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! This was from your original PR, but we were discussing this as well in #4599 (comment)

I can remove this part 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... maybe I hit a case where it was nil.

2RXV3M
2nselq

// This is the same as on the dockerd side.
// This can be removed after buildkit's detect package is updated.
if os.Getenv("OTEL_EXPORTER_OTLP_TRACES_PROTOCOL") == "" && os.Getenv("OTEL_EXPORTER_OTLP_PROTOCOL") == "" {
os.Setenv("OTEL_EXPORTER_OTLP_TRACES_PROTOCOL", "http/protobuf")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably is not needed since we aren't using detect anymore?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should go away - I verified things are still good without it

(@thaJeztah I deleted it over in https://github.com/thaJeztah/cli/pull/4/files#diff-3dcd6325b70a5f5db393a3ecc08060b69007fcef371fb4dabf3164da95646fb6L305-L312 )

@@ -357,6 +353,39 @@ func runDocker(dockerCli *command.DockerCli) error {
return cmd.Execute()
}

func initializeTracing() error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass the context here that was set on the command?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a context.Context available, it's probably worth passing through.

That said...I stepped through the source - neither otlphttp or otlpgrpc actually use it 🙃

Comment on lines 64 to 65
cmd.SetContext(ctx)
_ = dockerCli.Apply(command.WithContext(ctx))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. this one

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more spans. And I suggest splitting up the instrumentation and propagation work (with the OTEL API go.opentelemetry.io/otel/...) into separate commits from the exporter work (with the OTEL SDK go.opentelemetry.io/otel/sdk/...).

@@ -123,6 +125,8 @@ func RunStats(ctx context.Context, dockerCLI command.Cli, options *StatsOptions)

showAll := len(options.Containers) == 0
if showAll {
span := trace.SpanFromContext(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in doubt, create a nested span. It provides more detailed tracing insights than reusing a single top-level span for the entire lifetime of the process.

@thaJeztah
Copy link
Member Author

Thanks all! I pushed a couple of commits (to be squashed if ok)

I'll be calling it a day for now, but if someone wants to push to this branch; feel free to.

Pushing additional commits should be fine; we can cleanup / squash later (I didn't bother cleaning up commits yet while we're moving this still)

@@ -309,6 +348,22 @@ func runDocker(dockerCli *command.DockerCli) error {
return cmd.Execute()
}

func initializeTracing(ctx context.Context) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a context here

os.Setenv("OTEL_SERVICE_NAME", cmd.Root().Name())
}

if err := initializeTracing(cmd.Context()); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And passed cmd.Context() here, but double check if that's indeed the correct thing to do, or if it was deliberately a separate context 🙈

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs on it are minimal - it appears to be intended for exporter setup, e.g. if it needs to dial a connection up-front or something I guess? [see my other comment though, it's actually unused currently by both the HTTP/gRPC exporters]

@thaJeztah
Copy link
Member Author

HA!

#19 53.89 === FAIL: cmd/docker TestClientDebugEnabled (0.00s)
#19 53.89 panic: cannot create context from nil parent [recovered]
#19 53.89 	panic: cannot create context from nil parent

Comment on lines +69 to +94
// TODO: There doesn't seem to be a way to determine if the command returned an error, so we can set the span status here.
trace.SpanFromContext(cmd.Context()).End()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're fighting an uphill trying to use cobra lifecycle hooks for this. See below.

os.Setenv("OTEL_SERVICE_NAME", cmd.Root().Name())
}

if err := initializeTracing(cmd.Context()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize tracing in main(). Here's too late to be able to export traces from early in the process lifecycle.

Comment on lines +359 to +378
otel.SetTracerProvider(sdktrace.NewTracerProvider(
sdktrace.WithSpanProcessor(sdktrace.NewBatchSpanProcessor(exporter)),
sdktrace.WithResource(resource.Default()),
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tracer provider needs to be shut down before the process exits to ensure the recorded spans are flushed to the exporter.

@@ -309,6 +348,22 @@ func runDocker(dockerCli *command.DockerCli) error {
return cmd.Execute()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return cmd.Execute()
ctx, span := otel.Tracer("github.com/docker/docker/cli/cmd/docker").Start(
cmd.Context(),
"runDocker",
)
defer span.End()
err = cmd.ExecuteContext(ctx)
if err != nil {
span.SetStatus(codes.Error, "command errored")
span.RecordError(err)
}
return err

It'd be even better to start the span at the top of the function, so the latency before the command is executed gets captured in the traces.

- Add otel tracing support
- switch to use OTEL sdk instead of buildkit
- use default resource
  We already set os.SetEnv("OTEL_SERVICE_NAME", cmd.Name()) if
  it's not set, so don't have to do the merge here.
- pass the cmd.Context()
- Cobra's context should never be nil

Co-authored-by: Milas Bowman <[email protected]>
Co-authored-by: Sebastiaan van Stijn <[email protected]>
Signed-off-by: Brian Goff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants