Skip to content

fix otlp metric export when using http protocol #7595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 9, 2025

Conversation

garypen
Copy link
Contributor

@garypen garypen commented May 30, 2025

Ensure that the path handling for an OTLP over HTTP metrics endpoint is correct.

In 1.x we didn't need to perform additional processing. I think, although it's difficult to confirm, that this was auto-magically handled by the opentelemetry-otlp crate we use which is at version 0.13. I note that we were explicitly modifying endpoints for traces, so not sure why we had to do it for traces, but not metrics.

In 2.x we updated our version of the opentelemetry-otlp crate to 0.17 and it looks as though the "magic" no longer happens.

In addition to these problems it looks like we had a number of issues using the opentelemetry_otlp crate which:

  • Prevented correct use of default endpoint (by always setting one)
  • Prevented the use of the OTEL ENDPOINT environment variables
  • Ignored configured protocol settings and took the default from the opentelemetry_otlp crate

I've tried to fix as many issues as I can in a broadly compatible way.

Note: I've only fixed the OTEL ENDPOINT environment variables. Others continue to work/not work as they do currently.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

I've added a new manual test to more rigorously test the various endpoint manipulations that we make.
I manually tested the PR against the major APM providers to ensure that it was still possible to submit O11Y data to them.

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

garypen added 2 commits May 30, 2025 13:04
We use the rust opentelemetry crate to export metrics and traces.
In 1.x we use v 0.13 of that crate.
In 2.x we use v 0.17 of that crate.

It seems as though in the earlier version, exporting otlp metrics over
http would work without the correct path (/v1/metrics). This does not
work in v0.17. This means that exporting metrics over http has never
worked in router 2.x

I've added code to explicitly force the path to be correct (as we
already do for traces).
I just went through the test configuration and converted
localhost/metrics
to
localhost/v1/metrics

wherever I saw it. All the tests are passing.
@garypen garypen self-assigned this May 30, 2025
@garypen garypen requested a review from a team as a code owner May 30, 2025 15:31

This comment has been minimized.

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented May 30, 2025

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 19 changed, 0 removed
* graphos/routing/(latest)/cloud/subscriptions.mdx
* graphos/routing/(latest)/configuration/yaml.mdx
* graphos/routing/(latest)/observability/telemetry/metrics-exporters/prometheus.mdx
* graphos/routing/(latest)/observability/telemetry/trace-exporters/overview.mdx
* graphos/routing/(latest)/operations/subscriptions/overview.mdx
* graphos/routing/(latest)/operations/subscriptions/configuration.mdx
* graphos/routing/(latest)/operations/defer.mdx
* graphos/routing/(latest)/operations/file-upload.mdx
* graphos/routing/(latest)/performance/caching/entity.mdx
* graphos/routing/(latest)/performance/caching/in-memory.mdx
* graphos/routing/(latest)/query-planning/native-query-planner.mdx
* graphos/routing/(latest)/security/csrf.mdx
* graphos/routing/(latest)/security/demand-control.mdx
* graphos/routing/(latest)/security/authorization.mdx
* graphos/routing/(latest)/security/subgraph-authentication.mdx
* graphos/routing/(latest)/security/persisted-queries.mdx
* graphos/routing/(latest)/graphos-reporting.mdx
* graphos/routing/(latest)/uplink.mdx
* graphos/routing/(latest)/license.mdx

Build ID: 1346ae499f293fd97dc8e99a

URL: https://www.apollographql.com/docs/deploy-preview/1346ae499f293fd97dc8e99a

garypen added 12 commits May 30, 2025 16:34
I'm not sure if updating the configuration is correct or if the code
still needs more tweaking. This is a good first step to help me figure
that out.
We need to do some work to process endpoints to accomodate decisions we
made in the past.

Let's see what this looks like.
So that existing tests don't need to change. I think this is simpler
than before and it now works for gRPC/HTTP and metrics/traces,
which it didn't do before.
Forget about checking query of url (you can't have an endpoint that
contains a query...), but check the various realistic transforms that we
might make.
Otherwise we get "empty string" errors...
@garypen garypen requested a review from a team as a code owner June 4, 2025 07:02
garypen added 3 commits June 4, 2025 08:18
Add the commit hash for the test changes.
We can't parse our endpoint as a URI unless we have a scheme. If none is
specified, prepend "http://".

Extend our test to include this very common configuration use case.
@BrynCooke
Copy link
Contributor

@garypen Can we do a quick manual test against honeycomb and new relic?
I just want to make sure we can actually connect to these.

We'd like the standard OTLP environment variables for endpoint to work
(I think), or at least our customers would, so let's try to help them
out.
@garypen garypen requested review from a team as code owners June 6, 2025 07:50
After more research into how New Relix processes OTLP traffic, which
verified they do expect `/v1/metrics` and `/v1/traces` when using the
`http` protocol, add a test.
@garypen garypen force-pushed the garypen/fix-otlp-http-metrics branch from dd5fde5 to b37a6b3 Compare June 6, 2025 08:02
@apollo-librarian
Copy link

apollo-librarian bot commented Jun 9, 2025

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: 3bbf65816a42074898251b72

@apollo-librarian
Copy link

🛠️ Docs preview building...

The preview is currently being built.

Build ID: 604162c29c8252e9523f3cd4

1 similar comment
@apollo-librarian
Copy link

🛠️ Docs preview building...

The preview is currently being built.

Build ID: 604162c29c8252e9523f3cd4

@garypen garypen merged commit c8ae92e into dev Jun 9, 2025
14 checks passed
@garypen garypen deleted the garypen/fix-otlp-http-metrics branch June 9, 2025 07:50
garypen added a commit that referenced this pull request Jun 9, 2025
When I merged #7595, I squashed it and so my .gitleaks.toml updates were
no longer correct.

This fixes things.
garypen added a commit that referenced this pull request Jun 9, 2025
When I merged #7595, I squashed it and so my .gitleaks.toml updates were no longer correct.

This fixes things.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants