Skip to content

Emit pipeline and message operation events for diagnostics purposes #5465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

danielmarbach
Copy link
Contributor

@danielmarbach danielmarbach commented Oct 17, 2019

While attending Dotnetos in Warsaw and listening to Adam Sitnik it became clear to me that we have been totally neglecting the value of event sources. Having event sources in strategic places allows us as well as the customers to emit high performance tracing information that has only very minimal impact on the runtime. The information is available across all the platforms and allows to troubleshoot or compare the performance where required.

For us event sources furthermore allow us to have to equal runs and compare them in a comparison session (even automated) to verify whether the performance of the pipeline and the operations has regressed or not.

EventSources are an old concept but microsoft embraces them even in .NET Core and with the combination of EventPipes it is possible to consume them everywhere.

Naming of event sources

I compared names of event sources across corefx as well as ASPNETCore and there are no consistent naming guidelines. After synchronizing with David Fowler it became clear that newer event sources follow the dotted notation and no longer the dashed notation. Like

  • NServiceBus.Pipeline instead of NServiceBus-Pipeline

Unsafe

The event sources have a bunch of convenience WriteEvent methods that allow writing certain types of primitive types without allocating an object[] args array. For performance reasons where these methods can't be used the stackalloc approach with event data has to be used. In order to do so unsafe code has to be enabled. There are no consequences for the consumers of NSB. It's fine, let it go ;)

IsEnabled

EventSources are enabled when required. In order to not suffer from performance penalties, it is required to check whether the event source is enabled or not. Generally, when you look at the production event sources from Microsoft you can see multiple patterns.

  • Check IsEnabled on the caller before calling the event method
  • Check IsEnabled within the implementation of the event method
  • Check IsEnabled within a method that is marked with [NonEvent] attribute that delegates to the event method
  • Check IsEnabled within a method that is marked with [NonEvent] attribute that delegates to the event method in combination with Debug.Assert(IsEnabled).

Because we have cases where an existing async statemachine is available like in the main pipeline executor and we have cases where the statemachine was deliberately optimized away I went with

  • Check IsEnabled on the caller before calling the event method for cases where the state machine was deliberately omited because we need to check for fast path vs slow path execution in the caller anyway
  • Check IsEnabled within the implementation of the event method where an existing state machine is already available

The fast path uses the following pattern

return !Log.IsEnabled() ? fastPathInvoke() : SlowPathInvokeWithEventsEmit();

Start/Stop

The event sources here leverage the automatic start and stop tracking that became available when you have .NET 4.6 or higher installed (also supported in NETCore). Methods that end with Start and a corresponding Stop will automatically created implicit scopes (that even support async) and do the execution / duration time measurement for us. Furthermore this trickery allows us to track conversations. For example (notice the ActivityId colum and the TextFilter)

A message session conversation

SessionSendActivity

and incoming pipeline conversation

ReceiveActivity

with that every conversation gets a unique activity id that can be nested. See

https://blogs.msdn.microsoft.com/vancem/2015/09/14/exploring-eventsource-activity-correlation-and-causation-features/

Because pipelines are nested in the platform where it is available the recursive behavior has to be enabled to prevent the event source magic to do the auto-stop. See Error Handling and Recursion in

https://msdnshared.blob.core.windows.net/media/2016/07/EventSourceActivities.docx

To follow the guidance it also has Non-recursive entry points like MainStart and MainStop

Parameter names

The parameter names do not follow our coding guidelines. The reason is simple. The parameter names become columns / properties in the event data that can be filtered upon. For example

image

Events emitted

Events

General-purpose pipeline events are verbose while the other ones are informational

Enable in perfview

Alt-C

Settings

Start Collect

@danielmarbach
Copy link
Contributor Author

Btw. I'm not implying this should go into 7.2. If it is valuable and you think it is ok to pull in fine. If not this can go in at any other time

@SeanFeldman
Copy link
Contributor

It won't be part of 7.2, but the next maintenance release could include this improvement.
It would be useful to consider coming up with some recommendations for the downstream, such as transports and persistence, to yield information that could help customers.

@danielmarbach
Copy link
Contributor Author

@andreasohlund @timbussmann rebased on latest master changes.

@HEskandari
Copy link
Contributor

There is this request from 2016 as well.

@andreasohlund andreasohlund removed their request for review November 10, 2020 05:43
@stale
Copy link

stale bot commented Feb 7, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 7, 2021
@danielmarbach danielmarbach deleted the event-source branch February 7, 2021 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants