Skip to content

Refactor otel_sampler to use callbacks for setup, should_sample, and decription #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 19, 2021

Conversation

dvic
Copy link
Contributor

@dvic dvic commented Aug 4, 2021

This PR refactors the otel_sampler to a fully fledged behavior that introduces callbacks setup/1, should_sample/7, and description/1 for the otel_sampler behavior. Furthermore, the built-in samplers now support passing in decision result attributes.

@codecov
Copy link

codecov bot commented Aug 4, 2021

Codecov Report

Merging #261 (cd4ae4d) into main (24804a7) will increase coverage by 0.13%.
The diff coverage is 87.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #261      +/-   ##
==========================================
+ Coverage   36.39%   36.52%   +0.13%     
==========================================
  Files          37       41       +4     
  Lines        3168     3170       +2     
==========================================
+ Hits         1153     1158       +5     
+ Misses       2015     2012       -3     
Flag Coverage Δ
api 63.06% <100.00%> (+0.16%) ⬆️
elixir 15.98% <33.33%> (-0.07%) ⬇️
erlang 36.53% <87.30%> (+0.13%) ⬆️
exporter 19.60% <ø> (ø)
sdk 79.46% <86.66%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
apps/opentelemetry/src/otel_configuration.erl 78.57% <42.85%> (ø)
...elemetry/src/otel_sampler_trace_id_ratio_based.erl 71.42% <71.42%> (ø)
apps/opentelemetry/src/otel_sampler.erl 100.00% <100.00%> (+12.00%) ⬆️
apps/opentelemetry/src/otel_sampler_always_off.erl 100.00% <100.00%> (ø)
apps/opentelemetry/src/otel_sampler_always_on.erl 100.00% <100.00%> (ø)
...ps/opentelemetry/src/otel_sampler_parent_based.erl 100.00% <100.00%> (ø)
apps/opentelemetry/src/otel_span_utils.erl 96.55% <100.00%> (ø)
apps/opentelemetry/src/otel_tracer_server.erl 81.81% <100.00%> (ø)
apps/opentelemetry_api/src/otel_span.erl 74.07% <100.00%> (+2.07%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 24804a7...cd4ae4d. Read the comment docs.

@dvic dvic marked this pull request as ready for review August 4, 2021 19:21
@dvic dvic requested a review from a team August 4, 2021 19:21
@dvic dvic mentioned this pull request Aug 6, 2021
@dvic
Copy link
Contributor Author

dvic commented Aug 7, 2021

I have implemented your comments, have a look at the added/changed docs and let me know what you think.

@@ -8,7 +8,7 @@
stdlib,
opentelemetry_api
]},
{env, [{sampler, {parent_based, #{root => {always_on, #{}}}}}, % default sampler
{env, [{sampler, {otel_sampler_parent_based, #{root => {otel_sampler_always_on, #{}}}}}, % default sampler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I see the use of the modules being named like always_on.

Hm... We definitely can't have the module names be stuff like always_on but maybe the configuration should special case the builtin samplers instead of requiring the full module name?

I'm not sure it is worth it if it risks confusion and it is "write once" basically so not a big deal... @bryannaegele do you have any thoughts on something like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see it both ways. One reason to keep it atoms would be to keep the naming consistent across languages. It would probably be unnecessarily confusing to SREs to have to configure a whole module name in the env var vs using the same setting across all services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile to do a survey of the other SDKs and see if there is variance already. If it's all over the place, then we don't need to constrain ourselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh, that shows me that now there is "jaeger_remote", never even heard o that... and some added propagators.

@tsloughter
Copy link
Member

Looks like there are some elixir test failures now @dvic

@dvic
Copy link
Contributor Author

dvic commented Aug 12, 2021

Looks like there are some elixir test failures now @dvic

Fixed :)!

should_sample(Ctx, _TraceId, _Links, _SpanName, _SpanKind, _Attributes, _Opts) ->
{?DROP, [], tracestate(Ctx)}.

tracestate(Ctx) ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was about to merge this but giving it one last look over and noticed this function is duplicated in every sampler. I think it can just use otel_span:tracestate(Ctx) in the call above instead of having its own function for this in the sampler.

My guess as to why this was the case before is simply because the span_ctx record defaults to undefined for the tracestate field. This will have to be changed to default to [] and I can't think of any harm in that.

I guess a tiny put of extra memory will be used per-span_ctx...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but shouldn't it be

SpanCtx = otel_tracer:current_span_ctx(Ctx),
otel_span:tracestate(SpanCtx)

The next problem is that otel_tracer:current_span_ctx can return undefined and I assume Erlang does not have a || operator 🙂?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And yea, should update the otel_span function to match on undefined (important to never crash so functions that might get called with an undefined by a user need to work.

Question is... should it return [] or undefined for tracestate when called without a span context... I want to say [], but could see an argument for undefined.

But I'd say just go with returning [] unless someone has a good argument against that :).

So in otel_span:

tracestate(#span_ctx{tracestate=Tracestate}) ->
    Tracestate;
tracestate(_) ->
    [].

_ instead of undefined just to be extra cautious about not crashing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit. It currently now also checks tracestate(#span_ctx{tracestate=undefined}) -> because if I update span_ctx to have [] as default for the tracestate field, other tests start to fail. Is this also an acceptable solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, thats good, thanks.

I'll check on the tests that fail if that is changed after we get this merged.

@tsloughter tsloughter merged commit 8710301 into open-telemetry:main Aug 19, 2021
@dvic dvic deleted the dvic-refactor-sampler branch August 19, 2021 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants