Skip to content

Adds concept of "local root" used to partition spans by entry point #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 11, 2018

Conversation

codefromthecrypt
Copy link
Member

Subgraphs are often "squashed" when processing dependency links.
Usually, we have to skip data server-side to achieve this, for example,
"skipping up" the tree until we find the root-most span in that process.

Moreover, several conversations led to a desire for "skeletal spans"
which contain no intermediate info between inbound and outbound
requests. This allows for 100% sampling in edge cases such as surges or
very large amounts of traffic.

Finally, some APM systems require reporting that groups together entry
points for reasons of squashing or post-processing a trace. For example,
Amazon X-Ray have a type Segment which is only for exit spans, reserving
SubSegment for local ones. Having some means to partition data allows
post-processing such as this, for example bundling.

"local root" is the solution to this problem and similar. By adding a
property: localRootId to the trace context, we can track spans by
entry point. This means we can re-write parents to squash intermediates.
We can also expose this in logging contexts to accomodate correlation.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018

cc'ing folks who may have related works @ivantopo @wu-sheng @felixbarny @tylerbenson @abhiksingh @narayaruna @drolando @cwensel

also @williewheeler who is likely to have some near real time graph aggregations in a bit. This sort of handling can send connectable spans to such a pipeline https://github.com/ExpediaDotCom/haystack-adaptive-alerting

return localRootId;
}

public boolean isLocalRoot() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: doc me

@felixbarny
Copy link

FWIW, in Elastic APM, we have a dedicated domain object for entry spans - they are called Transactions. In the Java agent, the TraceContext also has a field for the transactionId so that each span knows which transaction it belongs to.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@felixbarny
Copy link

Not sure I understand your question.

Entry spans are transactions in our data model. Transaction extends AbstractSpan and Span extends AbstractSpan. I.e. Transactions are a special kind of span which represents an entry in a service.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@felixbarny
Copy link

felixbarny commented Oct 1, 2018

I mean in the link you pasted, the transactionId is generated at initialization

It's only initialized with zeros in the field declaration. The reason is that the TraceContext object is Recyclable. The actual ID is generated in the asRootSpan method and copied from the parent id in the asChildOf method.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@felixbarny
Copy link

Just as a suggestion, an alternative naming for localRootId could be entrySpanId. This is just based on the fact that I read of entry spans before but never about local roots. There is no standard or de-facto-standard name for that concept, but if you also like entry span it would make sense to align. That would kind of make this term a de-facto standard. Choose whatever you feel fits best but be aware that this might have a big impact in the tracing community/terminology in general which already has quite a lot of jargon.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@basvanbeek
Copy link
Member

Tricky one. I would only prefer entrySpanId if it indeed would exclude locally originated root spans. However I'm not sure we want to exclude those from the logic.

I have no objection to localRoot which in case of locally originated root spans would mean localRoot == root span. Because they happen to be both.

@felixbarny
Copy link

What are some of the use cases you have encountered where the local root is not an entry span? Does it even make sense to track those? I guess it probably depends...

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 1, 2018 via email

@felixbarny
Copy link

Another one which comes to my mind is JDBC calls during container startup, although you'd probably not want to monitor these unless you create a span which wraps the whole startup.

scheduled tasks start traces for example and so aren't entry spans.

Depending on how you define it, these could still be considered entry spans, even though there is no network interaction. You could argue that the scheduling of the task is the entry.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 2, 2018 via email

@felixbarny
Copy link

Thanks for the history lesson :)
Seems like localRoot fits right in the Zipkin terminology. But it also seems to have slightly different semantics than the Transaction concept we have in Elastic APM. We also have the concept of what you called skeleton traces (I like that name), as we do report non-sampled transactions (metadata like tags are removed on those).

@drolando
Copy link

drolando commented Oct 2, 2018

In my opinion both localRoot and entrySpanId are a bit ambiguous and could be confusing for users. Between the 2 I'd choose localRoot as it seems to fit better with brave terminology.

@codefromthecrypt
Copy link
Member Author

codefromthecrypt commented Oct 2, 2018 via email

@codefromthecrypt
Copy link
Member Author

ps added this thread in case others outside our ecosystem have anything to add. It will be input, not a democratic vote though :) as I don't think everyone in the world are brave developers (sniff sniff), so can't necessarily comment on how it fits here.

https://groups.google.com/forum/#!topic/distributed-tracing/HWOD3zdWD3s

@wu-sheng
Copy link
Member

wu-sheng commented Oct 4, 2018

@adriancole I am on vacation now, so I will catch you up after the days off.

Adrian Cole added 2 commits December 11, 2018 14:52
Subgraphs are often "squashed" when processing dependency links.
Usually, we have to skip data server-side to achieve this, for example,
"skipping up" the tree until we find the root-most span in that process.

Moreover, several conversations led to a desire for "skeletal spans"
which contain no intermediate info between inbound and outbound
requests. This allows for 100% sampling in edge cases such as surges or
very large amounts of traffic.

Finally, some APM systems require reporting that groups together entry
points for reasons of squashing or post-processing a trace. For example,
Amazon X-Ray have a type Segment which is only for exit spans, reserving
SubSegment for local ones. Having some means to partition data allows
post-processing such as this, for example bundling.

"local root" is the solution to this problem and similar. By adding a
property: `localRootId` to the trace context, we can track spans by
entry point. This means we can re-write parents to squash intermediates.
We can also expose this in logging contexts to accomodate correlation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants