-
Notifications
You must be signed in to change notification settings - Fork 721
Adds concept of "local root" used to partition spans by entry point #801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc'ing folks who may have related works @ivantopo @wu-sheng @felixbarny @tylerbenson @abhiksingh @narayaruna @drolando @cwensel also @williewheeler who is likely to have some near real time graph aggregations in a bit. This sort of handling can send connectable spans to such a pipeline https://github.com/ExpediaDotCom/haystack-adaptive-alerting |
return localRootId; | ||
} | ||
|
||
public boolean isLocalRoot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: doc me
FWIW, in Elastic APM, we have a dedicated domain object for entry spans - they are called |
the TraceContext also has a field for the transactionId
<https://github.com/elastic/apm-agent-java/blob/d8b583f30c6b406ba8f81e63f0ce71af22dc2469/apm-agent-core/src/main/java/co/elastic/apm/impl/transaction/TraceContext.java#L57>
so that each span knows which transaction it belongs to.
interesting.. in what case would this be different than if you re-used the
span ID at the entry point of the transaction as the transaction id?
… |
Not sure I understand your question. Entry spans are transactions in our data model. |
right sorry. I mean in the link you pasted, the transactionId is generated
at initialization and there is also a separate id field. was just curious
if there would have been impact if the same id value was shared.
…On Mon, 1 Oct 2018, 15:00 Felix Barnsteiner, ***@***.***> wrote:
Not sure I understand your question.
Entry spans are transactions in our data model. Transaction extends
AbstractSpan and Span extends AbstractSpan. I.e. Transactions are a
special kind of span which represents an entry in a service.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#801 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD61yt9lgC4ycYaT5_DKtK_QogwQIHQks5ugb2XgaJpZM4XBWGp>
.
|
It's only initialized with zeros in the field declaration. The reason is that the |
ah ok now I understand! thanks felix
…On Mon, 1 Oct 2018, 15:09 Felix Barnsteiner, ***@***.***> wrote:
I mean in the link you pasted, the transactionId is generated at
initialization
It's only initialized with zeros in the field declaration. The reason is
that the TraceContext object is Recyclable. The actual ID is generated in
asRootSpan method and copied from the parent id in the asChildOf or
method.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#801 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD615a-7E1tpTJS3wwsc0Nwfnomp03Iks5ugb-pgaJpZM4XBWGp>
.
|
Just as a suggestion, an alternative naming for |
Just as a suggestion, an alternative naming for localRootId could be
entrySpanId. This is just based on the fact that I read of entry spans
before but never about local roots. There is no standard or
de-facto-standard name for that concept, but if you also like entry span it
would make sense to align. That would kind of make this term a de-facto
standard. Choose whatever you feel fits best but be aware that this might
have a big impact in the tracing community/terminology in general which
already has quite a lot of jargon.
I agree finding the name is important. For example, in brave we have
localServiceName etc to identify the tracer-local data. One thing is that
this isn't public api per-se I mean technically it is, but I wouldn't
expect users to have to care about this which is more data routing and
management local to the tracer and not exported outside... I do like the
idea of hinting something about it being local but we can also solve that
with docs.
Entry span is often used to describe when something remote enters a
process. What's currently named localRootId is the place in the trace tree
that is local to this tracer instance. It doesn't matter if it is a root
(locally originated trace) or where an remote entry occurs. In other words,
it indicates the root partition of a trace local to this tracer.
I admit I thought about entry span (hence docs mentioning it), but the
square-is-a-rectangle but not all rectangles-are-a-square problem made me
hesitate. FWIW partitionId eldestSpanId are others thought about and
dismissed (kindof thought a little about amazon's segment name too). I felt
closest to partitionId but then "localRootId" made me feel a little better,
but I'm actually not that big of friends with it either.
Any other thoughts? (cc also @jcchavezs @basvanbeek who helped with a
similar naming debacle recently)
… |
one other way could be to remove the local root span condition from this
property (ex by redoing the code to see if heuristically we can accomplish
the same by looking for no parent id).
In that case entrySpanId would fit perfectly.
…On Mon, 1 Oct 2018, 18:23 Adrian Cole, ***@***.***> wrote:
Just as a suggestion, an alternative naming for localRootId could be
> entrySpanId. This is just based on the fact that I read of entry spans
> before but never about local roots. There is no standard or
> de-facto-standard name for that concept, but if you also like entry span it
> would make sense to align. That would kind of make this term a de-facto
> standard. Choose whatever you feel fits best but be aware that this might
> have a big impact in the tracing community/terminology in general which
> already has quite a lot of jargon.
>
I agree finding the name is important. For example, in brave we have
localServiceName etc to identify the tracer-local data. One thing is that
this isn't public api per-se I mean technically it is, but I wouldn't
expect users to have to care about this which is more data routing and
management local to the tracer and not exported outside... I do like the
idea of hinting something about it being local but we can also solve that
with docs.
Entry span is often used to describe when something remote enters a
process. What's currently named localRootId is the place in the trace tree
that is local to this tracer instance. It doesn't matter if it is a root
(locally originated trace) or where an remote entry occurs. In other words,
it indicates the root partition of a trace local to this tracer.
I admit I thought about entry span (hence docs mentioning it), but the
square-is-a-rectangle but not all rectangles-are-a-square problem made me
hesitate. FWIW partitionId eldestSpanId are others thought about and
dismissed (kindof thought a little about amazon's segment name too). I felt
closest to partitionId but then "localRootId" made me feel a little better,
but I'm actually not that big of friends with it either.
Any other thoughts? (cc also @jcchavezs @basvanbeek who helped with a
similar naming debacle recently)
>
|
ps it wont work (constraining towards the term entry span by eliminating
root). we need the partition id to always be present and inherited. root is
a frequent case and you have no other way of knowing the root span ID as
order isnt guaranteed and trace id is not guaranteed to be a function of
root span id...
so we are back to conflating terminology eventhough it doesnt match or
choosing something else or picking something different.
…On Mon, 1 Oct 2018, 18:36 Adrian Cole, ***@***.***> wrote:
one other way could be to remove the local root span condition from this
property (ex by redoing the code to see if heuristically we can accomplish
the same by looking for no parent id).
In that case entrySpanId would fit perfectly.
On Mon, 1 Oct 2018, 18:23 Adrian Cole, ***@***.***> wrote:
> Just as a suggestion, an alternative naming for localRootId could be
>> entrySpanId. This is just based on the fact that I read of entry spans
>> before but never about local roots. There is no standard or
>> de-facto-standard name for that concept, but if you also like entry span it
>> would make sense to align. That would kind of make this term a de-facto
>> standard. Choose whatever you feel fits best but be aware that this might
>> have a big impact in the tracing community/terminology in general which
>> already has quite a lot of jargon.
>>
> I agree finding the name is important. For example, in brave we have
> localServiceName etc to identify the tracer-local data. One thing is that
> this isn't public api per-se I mean technically it is, but I wouldn't
> expect users to have to care about this which is more data routing and
> management local to the tracer and not exported outside... I do like the
> idea of hinting something about it being local but we can also solve that
> with docs.
>
> Entry span is often used to describe when something remote enters a
> process. What's currently named localRootId is the place in the trace tree
> that is local to this tracer instance. It doesn't matter if it is a root
> (locally originated trace) or where an remote entry occurs. In other words,
> it indicates the root partition of a trace local to this tracer.
>
> I admit I thought about entry span (hence docs mentioning it), but the
> square-is-a-rectangle but not all rectangles-are-a-square problem made me
> hesitate. FWIW partitionId eldestSpanId are others thought about and
> dismissed (kindof thought a little about amazon's segment name too). I felt
> closest to partitionId but then "localRootId" made me feel a little better,
> but I'm actually not that big of friends with it either.
>
> Any other thoughts? (cc also @jcchavezs @basvanbeek who helped with a
> similar naming debacle recently)
>
>>
|
Tricky one. I would only prefer I have no objection to localRoot which in case of locally originated root spans would mean localRoot == root span. Because they happen to be both. |
What are some of the use cases you have encountered where the local root is not an entry span? Does it even make sense to track those? I guess it probably depends... |
What are some of the use cases you have encountered where the local root
is not an entry span? Does it even make sense to track those? I guess it
probably depends...
scheduled tasks start traces for example and so aren't entry spans.
… |
another is lambda invocations. you can't always tell what triggered the
function eventhough you can read an env variable if there is an Amazon
trace in progress.
…On Mon, 1 Oct 2018, 22:15 Adrian Cole, ***@***.***> wrote:
What are some of the use cases you have encountered where the local root
> is not an entry span? Does it even make sense to track those? I guess it
> probably depends...
>
scheduled tasks start traces for example and so aren't entry spans.
>
|
Another one which comes to my mind is JDBC calls during container startup, although you'd probably not want to monitor these unless you create a span which wraps the whole startup.
Depending on how you define it, these could still be considered entry spans, even though there is no network interaction. You could argue that the scheduling of the task is the entry. |
Good points though yeah it does depend on how we want to define entry and
if it hurts more than helps. conflation can actually cause confusion if too
loose.
other examples are triggers, watches, binary executions, things like this.
ex at twitter a couple years back we instrumented git client. also
zookeeper watches can set off a traced operation. build pipelines and
startup can have states that result in large ops (like your jdbc example)
about the local keyword maybe helpful to provide some history. I think I
may be able to convince you that if anything "local" is not new jargon
rather several years of history. entry span even if fits is new jargon at
least in zipkin. considering the only users of this field are literally
authors of brave integration who have had a concept of local and clear
naming conventions with that for years.
local span is the term for intermediate spans in zipkin. for example we
originally had in brave a type named local tracer and javascript has
Trace.local to trace something that isnt remote. we had to introduce "local
component" tag back in 2015 to retrofit spans with knowledge that they
aren't remote. other libraries like zipkin4net have annotations named
localoperationstart as well.
in v2 we had localEndpoint again qualifying in the model but without
needing a special tag. more recently we added sampledLocal to indicate a
tracer local recording decision that ignores remote (entry span) info.
localRoot or similar seems to fall in line with history more than entry
does. Entry doesn't fit really as I think we would agree it is more than a
stretch to use the term entry to describe any origin even same process.
question is though do you agree?
…On Mon, 1 Oct 2018, 22:58 Felix Barnsteiner, ***@***.***> wrote:
Another one which comes to my mind is JDBC calls during container startup,
although you'd probably not want to monitor these unless you create a span
which wraps the whole startup.
scheduled tasks start traces for example and so aren't entry spans.
Depending on how you define it, these could still be considered entry
spans, even though there is no network interaction. You could argue that
the scheduling of the task is the entry.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#801 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAD617uEzyGiqJDgrIqj2MofMEWkGnsjks5ugi2xgaJpZM4XBWGp>
.
|
Thanks for the history lesson :) |
In my opinion both |
PS I have been trying to wrap brain around entrySpanId, to figure a
way to have it not be confusing. We'd have to define it like this:
The entry span is the first span in a branch of a trace that is
visible to an instance of a tracer. IOTW, it is an entrypoint into a
trace, possibly its initial entry.
It could be a root span, that originated locally like:
* an android client application creating a root span in response to a gesture
* a watcher that tripped due to a file change, invoking a workflow
* a CLI application like git pull
* a scheduled task, either process scoped or internal to one
* a lambda function invoked from an unknown source
More typically, if could be a branch in an existing span, like:
* a server side RPC invocation (even if it reuses the same ID)
* a message consumption event (regardless of bulk or not, or if reprocessing)
If we are ok with defining entry span to describe all of the above,
I'm ok with calling it that. I still don't like the word entry, but
anyway, let's all sleep on it?
Notes about how we could possibly be tentative..
One possible way out is to hide the field completely on the trace
context and make it a special function to access it for its only call
site (a finished span handler).
Then it won't be a public method, therefore not even visible to users,
so can't confuse them. As I mentioned before, users don't have to know
about this at all unless it is added to a logging context. Ex
InternalPropagation.localRootId(context) would access the field.
Only issue is that it is a bit weird and I don't want to encourage
people to use internal methods, even non-users aka power users.
Another way is to make this still named localRootId, but make it a
function of the FinishedSpanHandler (ex
FinishedSpanHandler.localRootId(context) ). I'm not sure though if it
would be needed later in internal propagation logic... and I'd hate to
move it twice. We've never done that with any field before.
So, maybe what's best is to let folks sleep on it a few days. I'm
taking the weekend off anyway whah hah hah.
|
ps added this thread in case others outside our ecosystem have anything to add. It will be input, not a democratic vote though :) as I don't think everyone in the world are brave developers (sniff sniff), so can't necessarily comment on how it fits here. https://groups.google.com/forum/#!topic/distributed-tracing/HWOD3zdWD3s |
@adriancole I am on vacation now, so I will catch you up after the days off. |
Subgraphs are often "squashed" when processing dependency links. Usually, we have to skip data server-side to achieve this, for example, "skipping up" the tree until we find the root-most span in that process. Moreover, several conversations led to a desire for "skeletal spans" which contain no intermediate info between inbound and outbound requests. This allows for 100% sampling in edge cases such as surges or very large amounts of traffic. Finally, some APM systems require reporting that groups together entry points for reasons of squashing or post-processing a trace. For example, Amazon X-Ray have a type Segment which is only for exit spans, reserving SubSegment for local ones. Having some means to partition data allows post-processing such as this, for example bundling. "local root" is the solution to this problem and similar. By adding a property: `localRootId` to the trace context, we can track spans by entry point. This means we can re-write parents to squash intermediates. We can also expose this in logging contexts to accomodate correlation.
4673cae
to
77a671b
Compare
Subgraphs are often "squashed" when processing dependency links.
Usually, we have to skip data server-side to achieve this, for example,
"skipping up" the tree until we find the root-most span in that process.
Moreover, several conversations led to a desire for "skeletal spans"
which contain no intermediate info between inbound and outbound
requests. This allows for 100% sampling in edge cases such as surges or
very large amounts of traffic.
Finally, some APM systems require reporting that groups together entry
points for reasons of squashing or post-processing a trace. For example,
Amazon X-Ray have a type Segment which is only for exit spans, reserving
SubSegment for local ones. Having some means to partition data allows
post-processing such as this, for example bundling.
"local root" is the solution to this problem and similar. By adding a
property:
localRootId
to the trace context, we can track spans byentry point. This means we can re-write parents to squash intermediates.
We can also expose this in logging contexts to accomodate correlation.