Skip to content

Distributed Tracing for Entities (Isolated) #1198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 34 commits into
base: stevosyan/distributed-tracing-for-entities
Choose a base branch
from

Conversation

sophiatev
Copy link
Contributor

@sophiatev sophiatev commented Mar 28, 2025

This PR adds support for distributed tracing for entities in the .NET isolated framework. This repo is where the trace Activities are actually created for signaling and calling entities and for entities starting orchestrations from an isolated app.

  • RequestMessage, OperationRequest, OperationResult, SendSignalOperationAction, and StartNewOrchestrationOperationAction were altered to include extra information from the durabletask-dotnet repo where the requests to entities are actually generated, and where the requests are also executed. This extra information includes the time of the requests, the end time of the execution and any error messages, and parent trace contexts.
    • It's worth noting that the CreateTrace field added to RequestMessage is only used in the isolated case to indicate that we want to make an entity-specific trace for this request (it is set to true by the appropriate signal/call methods in durabletask-dotnet). In the in-process case, all of the traces are created in the WebJobs repo, so this field is not populated (and will be false by default).
  • TraceHelper, Schema, and TraceActivityConstants were updated with the instantiation of the entity-specific trace Activities
  • ClientyEntityHelpers and OrchestrationEntityContext methods that generate EntityMessageEvents which are used by the durabletask-dotnet repo were updated to attach this above-mentioned additional information to the message events.
  • TaskEntityDispatcher, which is where all entity requests end up (orchestrations calling/signaling entities, clients signaling entities, entities signaling other entities, and entities starting orchestrations), and where the entities are actually invoked to fulfill the requests, was updated to instantiate the corresponding traces. One exception is that clients signaling entities via gRPC (i.e., when the DurableEntityClient is a GrpcDurableEntityClient) is handled in the WebJobs repo, where the call ultimately reaches the LocalGrpcListener. The PR for this repo is linked below.
    • TaskEntityDispatcher.StartTraceActivityForSignalingEntity is used to create the Activity in the case of a client signaling an entity (via ShimDurableEntityClient in the dotnet repo, since the gRPC client call is handled by WebJobs) or in the case of an orchestration signaling an entity. In the former case, ShimDurableEntityClient has access to the correct parent trace context via Activity.Current.Context so it attaches this context to the request message itself. StartTraceActivityForSignalingEntity then parses and uses this context as the parent to the Activity for signaling the entity. For an orchestration signaling an entity, the dotnet repo does not have access to the orchestration trace context and neither does TaskEntityDispatcher. In this case, the way the parent trace context is attached is via TaskOrchestrationDispatcher.ProcessSendEvent, where Activity.Current.Context holds the orchestration context. This method only has access to the associated EventRaisedEvent, so this is what it attaches the parent trace context to and is what is eventually parsed and used by StartTraceActivityForSignalingEntity. Finally, in the case of an orchestration calling an entity, the Activity is only created at the very end once the call has completed. The code at that point only has access to the RequestMessage, so StartTraceActivityForSignalingEntity attaches the parent trace context from the EventRaisedEvent to the RequestMessage such that it can eventually be used when making the Activity for the call.

The various other PRs related to this effort are

It is worth noting that the Activities for signaling an entity in the isolated case will have longer durations than in the in-process case. In the in-process case, the Activity for a signal to an entity is created upon the request and almost immediately disposed. In the isolated case, we cannot immediately dispose the Activity upon the request since this would require creating the Activity in the dotnet repo where the request is generated. Instead, we create the Activity once the signal request reaches DurableTask.Core and is actually processed by the TaskEntityDispatcher, and pass the request time as the start time of the Activity. Its end time will therefore be much more offset from its start time (the request time) relative to the in-process case. This is not an issue for calls to entities since these are only ended once the operation completes (in the isolated case, once we send the result back to the orchestration instance, and in the in-process case once the entity invocation completes).

This is also true in the case of a client creating an orchestration using the ShimDurableTaskClient - we only create the Activity for the orchestration once the request reaches DurableTask.Core and is processed by the TaskOrchestrationDispatcher. Therefore the duration of the create orchestration Activity will be much longer than in all other cases where the Activity is started upon the request and almost immediately ended afterwards.

An example trace generated by this simple orchestration
image
looks as follows
image

Each signal request has type ActivityKind.Producer and each call request has type ActivityKind.Client (an entity starting an orchestration is also of type ActivityKind.Producer). When an entity actually processes the request, for a signal the span has type ActivityKind.Producer and for a call the span has type ActivityKind.Server. Note that the call to add_to_other_entity_step_1 starts a cascade of entities signaling other entities until eventually the last call is simply an add to the third entity.

If instead of starting the orchestration via an HTTP request we signal an entity to start the orchestration, the trace would look like this
image

@sophiatev sophiatev requested a review from jviau April 3, 2025 19:09
Sophia Tevosyan added 23 commits April 3, 2025 12:49
…to create in orchestration in the dotnet package
…yan/distributed-tracing-for-entities-isolated
…ntext of the create orchestration action if one was provided
…yan/distributed-tracing-for-entities-isolated
…yan/distributed-tracing-for-entities-isolated
…yan/distributed-tracing-for-entities-isolated
…yan/distributed-tracing-for-entities-isolated
…yan/distributed-tracing-for-entities-isolated
…yan/distributed-tracing-for-entities-isolated
…yan/distributed-tracing-for-entities-isolated
Sophia Tevosyan added 3 commits May 12, 2025 20:45
…n orchestration signaling an entity is too short, then the message gets redelivered and a trace is created for each redelivery. we fixed this and only make the trace once
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants