Adds concept of "local root" used to partition spans by entry point #801

codefromthecrypt · 2018-10-01T04:02:02Z

Subgraphs are often "squashed" when processing dependency links.
Usually, we have to skip data server-side to achieve this, for example,
"skipping up" the tree until we find the root-most span in that process.

Moreover, several conversations led to a desire for "skeletal spans"
which contain no intermediate info between inbound and outbound
requests. This allows for 100% sampling in edge cases such as surges or
very large amounts of traffic.

Finally, some APM systems require reporting that groups together entry
points for reasons of squashing or post-processing a trace. For example,
Amazon X-Ray have a type Segment which is only for exit spans, reserving
SubSegment for local ones. Having some means to partition data allows
post-processing such as this, for example bundling.

"local root" is the solution to this problem and similar. By adding a
property: localRootId to the trace context, we can track spans by
entry point. This means we can re-write parents to squash intermediates.
We can also expose this in logging contexts to accomodate correlation.

codefromthecrypt · 2018-10-01T04:07:02Z

cc'ing folks who may have related works @ivantopo @wu-sheng @felixbarny @tylerbenson @abhiksingh @narayaruna @drolando @cwensel

also @williewheeler who is likely to have some near real time graph aggregations in a bit. This sort of handling can send connectable spans to such a pipeline https://github.com/ExpediaDotCom/haystack-adaptive-alerting

brave/src/main/java/brave/propagation/TraceContext.java

codefromthecrypt · 2018-10-01T04:23:45Z

brave/src/main/java/brave/propagation/TraceContext.java

+    return localRootId;
+  }
+
+  public boolean isLocalRoot() {


todo: doc me

felixbarny · 2018-10-01T06:10:46Z

FWIW, in Elastic APM, we have a dedicated domain object for entry spans - they are called Transactions. In the Java agent, the TraceContext also has a field for the transactionId so that each span knows which transaction it belongs to.

codefromthecrypt · 2018-10-01T06:37:06Z

the TraceContext also has a field for the transactionId <https://github.com/elastic/apm-agent-java/blob/d8b583f30c6b406ba8f81e63f0ce71af22dc2469/apm-agent-core/src/main/java/co/elastic/apm/impl/transaction/TraceContext.java#L57> so that each span knows which transaction it belongs to.

interesting.. in what case would this be different than if you re-used the span ID at the entry point of the transaction as the transaction id?

…

felixbarny · 2018-10-01T07:00:39Z

Not sure I understand your question.

Entry spans are transactions in our data model. Transaction extends AbstractSpan and Span extends AbstractSpan. I.e. Transactions are a special kind of span which represents an entry in a service.

codefromthecrypt · 2018-10-01T07:05:34Z

right sorry. I mean in the link you pasted, the transactionId is generated at initialization and there is also a separate id field. was just curious if there would have been impact if the same id value was shared.

…

On Mon, 1 Oct 2018, 15:00 Felix Barnsteiner, ***@***.***> wrote: Not sure I understand your question. Entry spans are transactions in our data model. Transaction extends AbstractSpan and Span extends AbstractSpan. I.e. Transactions are a special kind of span which represents an entry in a service. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#801 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD61yt9lgC4ycYaT5_DKtK_QogwQIHQks5ugb2XgaJpZM4XBWGp> .

felixbarny · 2018-10-01T07:09:21Z

I mean in the link you pasted, the transactionId is generated at initialization

It's only initialized with zeros in the field declaration. The reason is that the TraceContext object is Recyclable. The actual ID is generated in the asRootSpan method and copied from the parent id in the asChildOf method.

codefromthecrypt · 2018-10-01T07:11:32Z

ah ok now I understand! thanks felix

…

On Mon, 1 Oct 2018, 15:09 Felix Barnsteiner, ***@***.***> wrote: I mean in the link you pasted, the transactionId is generated at initialization It's only initialized with zeros in the field declaration. The reason is that the TraceContext object is Recyclable. The actual ID is generated in asRootSpan method and copied from the parent id in the asChildOf or method. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#801 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD615a-7E1tpTJS3wwsc0Nwfnomp03Iks5ugb-pgaJpZM4XBWGp> .

felixbarny · 2018-10-01T10:03:41Z

Just as a suggestion, an alternative naming for localRootId could be entrySpanId. This is just based on the fact that I read of entry spans before but never about local roots. There is no standard or de-facto-standard name for that concept, but if you also like entry span it would make sense to align. That would kind of make this term a de-facto standard. Choose whatever you feel fits best but be aware that this might have a big impact in the tracing community/terminology in general which already has quite a lot of jargon.

codefromthecrypt · 2018-10-01T10:23:44Z

Just as a suggestion, an alternative naming for localRootId could be entrySpanId. This is just based on the fact that I read of entry spans before but never about local roots. There is no standard or de-facto-standard name for that concept, but if you also like entry span it would make sense to align. That would kind of make this term a de-facto standard. Choose whatever you feel fits best but be aware that this might have a big impact in the tracing community/terminology in general which already has quite a lot of jargon.

I agree finding the name is important. For example, in brave we have localServiceName etc to identify the tracer-local data. One thing is that this isn't public api per-se I mean technically it is, but I wouldn't expect users to have to care about this which is more data routing and management local to the tracer and not exported outside... I do like the idea of hinting something about it being local but we can also solve that with docs. Entry span is often used to describe when something remote enters a process. What's currently named localRootId is the place in the trace tree that is local to this tracer instance. It doesn't matter if it is a root (locally originated trace) or where an remote entry occurs. In other words, it indicates the root partition of a trace local to this tracer. I admit I thought about entry span (hence docs mentioning it), but the square-is-a-rectangle but not all rectangles-are-a-square problem made me hesitate. FWIW partitionId eldestSpanId are others thought about and dismissed (kindof thought a little about amazon's segment name too). I felt closest to partitionId but then "localRootId" made me feel a little better, but I'm actually not that big of friends with it either. Any other thoughts? (cc also @jcchavezs @basvanbeek who helped with a similar naming debacle recently)

…

codefromthecrypt · 2018-10-01T10:36:53Z

one other way could be to remove the local root span condition from this property (ex by redoing the code to see if heuristically we can accomplish the same by looking for no parent id). In that case entrySpanId would fit perfectly.

…

On Mon, 1 Oct 2018, 18:23 Adrian Cole, ***@***.***> wrote: Just as a suggestion, an alternative naming for localRootId could be > entrySpanId. This is just based on the fact that I read of entry spans > before but never about local roots. There is no standard or > de-facto-standard name for that concept, but if you also like entry span it > would make sense to align. That would kind of make this term a de-facto > standard. Choose whatever you feel fits best but be aware that this might > have a big impact in the tracing community/terminology in general which > already has quite a lot of jargon. > I agree finding the name is important. For example, in brave we have localServiceName etc to identify the tracer-local data. One thing is that this isn't public api per-se I mean technically it is, but I wouldn't expect users to have to care about this which is more data routing and management local to the tracer and not exported outside... I do like the idea of hinting something about it being local but we can also solve that with docs. Entry span is often used to describe when something remote enters a process. What's currently named localRootId is the place in the trace tree that is local to this tracer instance. It doesn't matter if it is a root (locally originated trace) or where an remote entry occurs. In other words, it indicates the root partition of a trace local to this tracer. I admit I thought about entry span (hence docs mentioning it), but the square-is-a-rectangle but not all rectangles-are-a-square problem made me hesitate. FWIW partitionId eldestSpanId are others thought about and dismissed (kindof thought a little about amazon's segment name too). I felt closest to partitionId but then "localRootId" made me feel a little better, but I'm actually not that big of friends with it either. Any other thoughts? (cc also @jcchavezs @basvanbeek who helped with a similar naming debacle recently) >

codefromthecrypt · 2018-10-01T10:46:02Z

ps it wont work (constraining towards the term entry span by eliminating root). we need the partition id to always be present and inherited. root is a frequent case and you have no other way of knowing the root span ID as order isnt guaranteed and trace id is not guaranteed to be a function of root span id... so we are back to conflating terminology eventhough it doesnt match or choosing something else or picking something different.

…

On Mon, 1 Oct 2018, 18:36 Adrian Cole, ***@***.***> wrote: one other way could be to remove the local root span condition from this property (ex by redoing the code to see if heuristically we can accomplish the same by looking for no parent id). In that case entrySpanId would fit perfectly. On Mon, 1 Oct 2018, 18:23 Adrian Cole, ***@***.***> wrote: > Just as a suggestion, an alternative naming for localRootId could be >> entrySpanId. This is just based on the fact that I read of entry spans >> before but never about local roots. There is no standard or >> de-facto-standard name for that concept, but if you also like entry span it >> would make sense to align. That would kind of make this term a de-facto >> standard. Choose whatever you feel fits best but be aware that this might >> have a big impact in the tracing community/terminology in general which >> already has quite a lot of jargon. >> > I agree finding the name is important. For example, in brave we have > localServiceName etc to identify the tracer-local data. One thing is that > this isn't public api per-se I mean technically it is, but I wouldn't > expect users to have to care about this which is more data routing and > management local to the tracer and not exported outside... I do like the > idea of hinting something about it being local but we can also solve that > with docs. > > Entry span is often used to describe when something remote enters a > process. What's currently named localRootId is the place in the trace tree > that is local to this tracer instance. It doesn't matter if it is a root > (locally originated trace) or where an remote entry occurs. In other words, > it indicates the root partition of a trace local to this tracer. > > I admit I thought about entry span (hence docs mentioning it), but the > square-is-a-rectangle but not all rectangles-are-a-square problem made me > hesitate. FWIW partitionId eldestSpanId are others thought about and > dismissed (kindof thought a little about amazon's segment name too). I felt > closest to partitionId but then "localRootId" made me feel a little better, > but I'm actually not that big of friends with it either. > > Any other thoughts? (cc also @jcchavezs @basvanbeek who helped with a > similar naming debacle recently) > >>

basvanbeek · 2018-10-01T10:46:36Z

Tricky one. I would only prefer entrySpanId if it indeed would exclude locally originated root spans. However I'm not sure we want to exclude those from the logic.

I have no objection to localRoot which in case of locally originated root spans would mean localRoot == root span. Because they happen to be both.

felixbarny · 2018-10-01T11:03:11Z

What are some of the use cases you have encountered where the local root is not an entry span? Does it even make sense to track those? I guess it probably depends...

codefromthecrypt · 2018-10-01T14:15:55Z

What are some of the use cases you have encountered where the local root is not an entry span? Does it even make sense to track those? I guess it probably depends...

scheduled tasks start traces for example and so aren't entry spans.

…

codefromthecrypt · 2018-10-01T14:18:42Z

another is lambda invocations. you can't always tell what triggered the function eventhough you can read an env variable if there is an Amazon trace in progress.

…

On Mon, 1 Oct 2018, 22:15 Adrian Cole, ***@***.***> wrote: What are some of the use cases you have encountered where the local root > is not an entry span? Does it even make sense to track those? I guess it > probably depends... > scheduled tasks start traces for example and so aren't entry spans. >

felixbarny · 2018-10-01T14:58:51Z

Another one which comes to my mind is JDBC calls during container startup, although you'd probably not want to monitor these unless you create a span which wraps the whole startup.

scheduled tasks start traces for example and so aren't entry spans.

Depending on how you define it, these could still be considered entry spans, even though there is no network interaction. You could argue that the scheduling of the task is the entry.

codefromthecrypt · 2018-10-02T01:26:58Z

Good points though yeah it does depend on how we want to define entry and if it hurts more than helps. conflation can actually cause confusion if too loose. other examples are triggers, watches, binary executions, things like this. ex at twitter a couple years back we instrumented git client. also zookeeper watches can set off a traced operation. build pipelines and startup can have states that result in large ops (like your jdbc example) about the local keyword maybe helpful to provide some history. I think I may be able to convince you that if anything "local" is not new jargon rather several years of history. entry span even if fits is new jargon at least in zipkin. considering the only users of this field are literally authors of brave integration who have had a concept of local and clear naming conventions with that for years. local span is the term for intermediate spans in zipkin. for example we originally had in brave a type named local tracer and javascript has Trace.local to trace something that isnt remote. we had to introduce "local component" tag back in 2015 to retrofit spans with knowledge that they aren't remote. other libraries like zipkin4net have annotations named localoperationstart as well. in v2 we had localEndpoint again qualifying in the model but without needing a special tag. more recently we added sampledLocal to indicate a tracer local recording decision that ignores remote (entry span) info. localRoot or similar seems to fall in line with history more than entry does. Entry doesn't fit really as I think we would agree it is more than a stretch to use the term entry to describe any origin even same process. question is though do you agree?

…

On Mon, 1 Oct 2018, 22:58 Felix Barnsteiner, ***@***.***> wrote: Another one which comes to my mind is JDBC calls during container startup, although you'd probably not want to monitor these unless you create a span which wraps the whole startup. scheduled tasks start traces for example and so aren't entry spans. Depending on how you define it, these could still be considered entry spans, even though there is no network interaction. You could argue that the scheduling of the task is the entry. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#801 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD617uEzyGiqJDgrIqj2MofMEWkGnsjks5ugi2xgaJpZM4XBWGp> .

felixbarny · 2018-10-02T17:08:29Z

Thanks for the history lesson :)
Seems like localRoot fits right in the Zipkin terminology. But it also seems to have slightly different semantics than the Transaction concept we have in Elastic APM. We also have the concept of what you called skeleton traces (I like that name), as we do report non-sampled transactions (metadata like tags are removed on those).

drolando · 2018-10-02T22:09:47Z

In my opinion both localRoot and entrySpanId are a bit ambiguous and could be confusing for users. Between the 2 I'd choose localRoot as it seems to fit better with brave terminology.

codefromthecrypt · 2018-10-02T22:46:02Z

PS I have been trying to wrap brain around entrySpanId, to figure a way to have it not be confusing. We'd have to define it like this: The entry span is the first span in a branch of a trace that is visible to an instance of a tracer. IOTW, it is an entrypoint into a trace, possibly its initial entry. It could be a root span, that originated locally like: * an android client application creating a root span in response to a gesture * a watcher that tripped due to a file change, invoking a workflow * a CLI application like git pull * a scheduled task, either process scoped or internal to one * a lambda function invoked from an unknown source More typically, if could be a branch in an existing span, like: * a server side RPC invocation (even if it reuses the same ID) * a message consumption event (regardless of bulk or not, or if reprocessing) If we are ok with defining entry span to describe all of the above, I'm ok with calling it that. I still don't like the word entry, but anyway, let's all sleep on it? Notes about how we could possibly be tentative.. One possible way out is to hide the field completely on the trace context and make it a special function to access it for its only call site (a finished span handler). Then it won't be a public method, therefore not even visible to users, so can't confuse them. As I mentioned before, users don't have to know about this at all unless it is added to a logging context. Ex InternalPropagation.localRootId(context) would access the field. Only issue is that it is a bit weird and I don't want to encourage people to use internal methods, even non-users aka power users. Another way is to make this still named localRootId, but make it a function of the FinishedSpanHandler (ex FinishedSpanHandler.localRootId(context) ). I'm not sure though if it would be needed later in internal propagation logic... and I'd hate to move it twice. We've never done that with any field before. So, maybe what's best is to let folks sleep on it a few days. I'm taking the weekend off anyway whah hah hah.

codefromthecrypt · 2018-10-02T22:55:08Z

ps added this thread in case others outside our ecosystem have anything to add. It will be input, not a democratic vote though :) as I don't think everyone in the world are brave developers (sniff sniff), so can't necessarily comment on how it fits here.

https://groups.google.com/forum/#!topic/distributed-tracing/HWOD3zdWD3s

wu-sheng · 2018-10-04T13:53:19Z

@adriancole I am on vacation now, so I will catch you up after the days off.

Subgraphs are often "squashed" when processing dependency links. Usually, we have to skip data server-side to achieve this, for example, "skipping up" the tree until we find the root-most span in that process. Moreover, several conversations led to a desire for "skeletal spans" which contain no intermediate info between inbound and outbound requests. This allows for 100% sampling in edge cases such as surges or very large amounts of traffic. Finally, some APM systems require reporting that groups together entry points for reasons of squashing or post-processing a trace. For example, Amazon X-Ray have a type Segment which is only for exit spans, reserving SubSegment for local ones. Having some means to partition data allows post-processing such as this, for example bundling. "local root" is the solution to this problem and similar. By adding a property: `localRootId` to the trace context, we can track spans by entry point. This means we can re-write parents to squash intermediates. We can also expose this in logging contexts to accomodate correlation.

codefromthecrypt commented Oct 1, 2018

View reviewed changes

brave/src/main/java/brave/propagation/TraceContext.java Show resolved Hide resolved

codefromthecrypt commented Oct 1, 2018

View reviewed changes

zeagord approved these changes Oct 1, 2018

View reviewed changes

codefromthecrypt mentioned this pull request Oct 5, 2018

Exclude certain spans from creating a new trace census-instrumentation/opencensus-specs#160

Open

codefromthecrypt mentioned this pull request Dec 9, 2018

Migrate to stackdriver trace API v2. openzipkin/zipkin-gcp#112

Merged

llinder approved these changes Dec 11, 2018

View reviewed changes

Adrian Cole added 2 commits December 11, 2018 14:52

drift

39d0a19

Shows how to add tags only once

77a671b

codefromthecrypt force-pushed the localRoot branch from 4673cae to 77a671b Compare December 11, 2018 07:01

This was referenced Dec 11, 2018

can i find some way to add annotations into span? openzipkin/zipkin-reporter-java#33

Closed

Resurrect "default tags" feature #357

Open

codefromthecrypt merged commit 667f6b4 into master Dec 11, 2018

codefromthecrypt deleted the localRoot branch December 11, 2018 07:33

axw mentioned this pull request Jan 21, 2019

Proposal: record first local span's ID in SpanContext census-instrumentation/opencensus-specs#229

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds concept of "local root" used to partition spans by entry point #801

Adds concept of "local root" used to partition spans by entry point #801

codefromthecrypt commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 •

edited

Loading

codefromthecrypt Oct 1, 2018

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018 •

edited

Loading

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

codefromthecrypt commented Oct 1, 2018 via email

codefromthecrypt commented Oct 1, 2018 via email

basvanbeek commented Oct 1, 2018

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 2, 2018 via email

felixbarny commented Oct 2, 2018

drolando commented Oct 2, 2018

codefromthecrypt commented Oct 2, 2018 via email

codefromthecrypt commented Oct 2, 2018

wu-sheng commented Oct 4, 2018

Adds concept of "local root" used to partition spans by entry point #801

Adds concept of "local root" used to partition spans by entry point #801

Conversation

codefromthecrypt commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 • edited Loading

codefromthecrypt Oct 1, 2018

Choose a reason for hiding this comment

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018 • edited Loading

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

codefromthecrypt commented Oct 1, 2018 via email

codefromthecrypt commented Oct 1, 2018 via email

basvanbeek commented Oct 1, 2018

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 1, 2018 via email

codefromthecrypt commented Oct 1, 2018 via email

felixbarny commented Oct 1, 2018

codefromthecrypt commented Oct 2, 2018 via email

felixbarny commented Oct 2, 2018

drolando commented Oct 2, 2018

codefromthecrypt commented Oct 2, 2018 via email

codefromthecrypt commented Oct 2, 2018

wu-sheng commented Oct 4, 2018

codefromthecrypt commented Oct 1, 2018 •

edited

Loading

felixbarny commented Oct 1, 2018 •

edited

Loading