Emit pipeline and message operation events for diagnostics purposes #5465
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While attending Dotnetos in Warsaw and listening to Adam Sitnik it became clear to me that we have been totally neglecting the value of event sources. Having event sources in strategic places allows us as well as the customers to emit high performance tracing information that has only very minimal impact on the runtime. The information is available across all the platforms and allows to troubleshoot or compare the performance where required.
For us event sources furthermore allow us to have to equal runs and compare them in a comparison session (even automated) to verify whether the performance of the pipeline and the operations has regressed or not.
EventSources are an old concept but microsoft embraces them even in .NET Core and with the combination of EventPipes it is possible to consume them everywhere.
Naming of event sources
I compared names of event sources across corefx as well as ASPNETCore and there are no consistent naming guidelines. After synchronizing with David Fowler it became clear that newer event sources follow the dotted notation and no longer the dashed notation. Like
NServiceBus.Pipeline
instead ofNServiceBus-Pipeline
Unsafe
The event sources have a bunch of convenience
WriteEvent
methods that allow writing certain types of primitive types without allocating anobject[] args
array. For performance reasons where these methods can't be used thestackalloc
approach with event data has to be used. In order to do so unsafe code has to be enabled. There are no consequences for the consumers of NSB. It's fine, let it go ;)IsEnabled
EventSources are enabled when required. In order to not suffer from performance penalties, it is required to check whether the event source is enabled or not. Generally, when you look at the production event sources from Microsoft you can see multiple patterns.
IsEnabled
on the caller before calling the event methodIsEnabled
within the implementation of the event methodIsEnabled
within a method that is marked with[NonEvent]
attribute that delegates to the event methodIsEnabled
within a method that is marked with[NonEvent]
attribute that delegates to the event method in combination withDebug.Assert(IsEnabled)
.Because we have cases where an existing async statemachine is available like in the main pipeline executor and we have cases where the statemachine was deliberately optimized away I went with
IsEnabled
on the caller before calling the event method for cases where the state machine was deliberately omited because we need to check for fast path vs slow path execution in the caller anywayIsEnabled
within the implementation of the event method where an existing state machine is already availableThe fast path uses the following pattern
Start/Stop
The event sources here leverage the automatic start and stop tracking that became available when you have .NET 4.6 or higher installed (also supported in NETCore). Methods that end with
Start
and a correspondingStop
will automatically created implicit scopes (that even support async) and do the execution / duration time measurement for us. Furthermore this trickery allows us to track conversations. For example (notice theActivityId
colum and the TextFilter)A message session conversation
and incoming pipeline conversation
with that every conversation gets a unique activity id that can be nested. See
https://blogs.msdn.microsoft.com/vancem/2015/09/14/exploring-eventsource-activity-correlation-and-causation-features/
Because pipelines are nested in the platform where it is available the recursive behavior has to be enabled to prevent the event source magic to do the auto-stop. See Error Handling and Recursion in
https://msdnshared.blob.core.windows.net/media/2016/07/EventSourceActivities.docx
To follow the guidance it also has Non-recursive entry points like
MainStart
andMainStop
Parameter names
The parameter names do not follow our coding guidelines. The reason is simple. The parameter names become columns / properties in the event data that can be filtered upon. For example
Events emitted
General-purpose pipeline events are verbose while the other ones are informational
Enable in perfview
Alt-C
Start Collect