Support MySqlConnector #85

bgrainger · 2020-08-03T23:57:16Z

MySqlConnector is an advanced ADO.NET library for MySQL that supports asynchronous operations. (The "standard" MySQL Connector/NET package does not.)

Feature Description

Supporting this library out-of-the-box is a popular feature request for New Relic customers.

There is custom instrumentation XML for MySqlConnector, but as discussed in in NR Support case 373822 (NOTE: private link) and demonstrated in this repo, when it's used, database timings are doubled:

This trace shows a definite problem with the database timings. Specifically, almost all database calls are logged as extending to the end of the transaction, e.g., 164.66ms then 162.23ms, then 160.97ms, then 158.94ms, etc. (They should be something more like 2.4ms, 1.3ms, 2.0ms, etc.)

Summing up all these incorrect timings gives an overall incorrect picture of how long the API is spending in database transactions.

According to that support ticket:

So, after diving more deeply into the code and your repro our dev team has determined the agent code behind the SqlCommandTracerAsync tracer does not work correctly with the MySqlConnector driver and proper support will require additional investigation, development, testing, etc. There isn't a quick fix here :-(

Now that newrelic-dotnet-agent is Open Source, is it possible the community could help develop this fix? What are the specifics of how the "tracer does not work correctly", and what would be the correct approach to fixing it?

Priority

This is a "Must Have" for us. We use MySqlConnector for its async I/O capabilities, but all database timings are greatly inflated in the NR UI, making it useless for actually diagnosing DB issues.

elucus · 2020-08-04T22:26:33Z

Hi @bgrainger, thanks for your interest in the .NET agent. We believe that support for the MySQLConnector would be a valuable addition. However, at this point, we are unable to commit to a timeline to resolution for this feature. From your comments, it sounds like perhaps you’d be interested in contributing. If this is the case, please let us know, and we can provide some direction to get you started. Additionally, please let us know what database you’re using the MySQLConnector with.

bgrainger · 2020-08-04T22:49:49Z

From your comments, it sounds like perhaps you’d be interested in contributing. If this is the case, please let us know, and we can provide some direction to get you started.

@elucus Yes, this is the case. Any information you can provide on troubleshooting/debugging you've already performed and suggestions/ideas for the right path to a resolution would be greatly appreciated!

Additionally, please let us know what database you’re using the MySQLConnector with.

It's a variety of databases, including Percona 5.7 on SmartOS, Google Cloud SQL (5.7), Azure Database for MySQL (5.7 and 8.0).

elucus · 2020-08-04T22:55:47Z

Hi @bgrainger, great to hear. I'll talk with the team and we'll start working on some pointers to get you going.

nrcventura · 2020-08-05T18:18:47Z

Hi @bgrainger, I haven't had enough time to dig into the issue enough to know exactly where the problem is, but I have a few ideas, and can try to get you pointed in an initial direction. I'm assuming you're able to build the agent locally so that you're able to do some debugging. If not, please let me know so that I can follow up with more instructions on how to get that working.

There are two files of interest that control the instrumentation.

Instrumentation.xml which is what instructs the native profiler (the part of the agent that is responsible for manipulating IL so that we can "wrap" method calls) to instrument a given method. All of our SQL instrumentation is configured in this file. MySqlConnector.xml is another example of this instrumentation configuration.
SqlCommandWrapper.cs is the code that gets executed when our SQL instrumentation is invoked.

Based on what you described, it sounds like either the segment that's created in SqlCommandWrapper.cs, is not ending until after the entire transaction ends. This can happen for a couple of reasons.

Some problem occurred so our instrumentation never calls segment.end (sometimes an exception occurs, some cast operation fails, we lost track of the segment or transaction, etc.)
The current synchronization context causes our continuation based timing to execute after the request ends (I forget which frameworks lead to this behavior, but some specify a synchronization context that cause only 1 thread to operate on a given request at a time).

Since the MySqlConnector.xml file specifies that the tracer name is SqlCommandTracerAsync and the method being instrumented returns a Task<MySqlDataReader>, the agent's SqlCommandAsyncWrapper should be the wrapper that gets executed. This wrapper will start a new segment when the ExecuteReaderAsync method is invoked in the MySqlConnector library. When that method returns a Task<MySqlDataReader> we register a continuation on it so that we can end the segment when the operation completes as opposed to when this method returns.

Hopefully you'll be able to debug into this code to provide some additional insight or discover what the problem is.

If you have any questions feel free to reach out to me.

bgrainger · 2020-08-05T18:54:06Z

I can build both the managed and native solutions fine. I can trace through the code in SqlCommandWrapper.cs and Delegates.cs, but it's not immediately clear to me how I should go about debugging a "realistic" instrumentation scenario to see where things might be going awry; would that involve setting the COR_PROFILER environment environment for a sample app?

nrcventura · 2020-08-06T00:34:05Z

That's great that you're able to build the agent, and you are correct that you will need to setup some environment variables for a sample app. Now that it's built there are some directories that were created that contain the agent in 4 different configurations. You'll need to configure some environment variables to refer to one of those directories depending on the test app that you're going to test with. An overview of the directories and environment variables can be found in https://github.com/newrelic/newrelic-dotnet-agent/blob/main/docs/development.md. I'm assuming that you'll be testing using the sample app that you previously created, and if so you'll probably need to setup the environment variables to use the newrelichome_x64 folder. You'll know that things are setup correctly if the logs folder within that folder populates and includes both a profiler log and an agent log. A profiler log looks like NewRelic.Profiler.232.log where 232 is the processId. An agent log looks like newrelic_agent_InstrumentedWebApp.log where InstrumentedWebApp is the configured application name for the instrumented app.

Once that's working you'll be able to debug the agent using Visual Studio by attaching the debugger of the Visual Studio instance containing the New Relic agent solution (not the instance that contains your application's code) to the running process. From there you should be able to set break points in the agent solution and be able to step through the code.

nrcventura · 2020-08-06T18:29:56Z

@bgrainger, let me know if it would be useful to arrange a screen share to help with some initial debugging?

bgrainger · 2020-08-06T20:35:15Z

Thanks for the pointers! I'll try to get up and running using your instructions soon and ping you if I get stuck.

bgrainger · 2020-09-04T22:55:54Z

I have been able to build a simple .NET 5 web api, attach VS, and debug the Agent.

However, my sample app that I previously provided to NR support (https://github.com/bgrainger/DatabaseTimingTest) doesn't seem to reproduce the DB timing problem I have seen in the past, so I'll need to work on getting a consistent repro locally that I can debug.

Update: Building my test app for .NET 4.7.2 and hosting it under IIS does appear to repro the problem, so the next step will be debugging it.

bgrainger · 2020-09-07T18:37:54Z

What I see happening is that OnSuccess<T> for a MySqlConnector Task calls ContinueWith (with EndSegment) but that continuation isn't executed until all MySqlConnector work is done. I think this might have something to do with how AspNetSynchronizationContext queues task continuations onto one long chain, but I haven't figured out the specific details.

However, I did determine that if I just instrument the public API methods (instead of one Task-returning implementation method), everything seems to be measured as I would expect. I'll open a PR for this.

This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).

nrcventura · 2020-09-08T20:34:42Z

I'm glad that you were able to notice the continuation-based timing problem which happens because of the AspNetSynchronizationContext. We've run into that before but have not fixed it across all of our instrumentation because we want to understand a few things better.

Is it better for us performance-wise to leverage the available synchronization context when registering our continuations?
Should we always bypass the available synchronization context and attempt to execute our continuation synchronously?

We addressed this problem within our Stack Exchange Redis instrumentation with this call to Delegates.GetAsyncDelegateFor. There are some explanations in the comments that explain our thought process.

bgrainger · 2020-09-09T14:04:12Z

Thanks; that link (and code comment) helps.

I still need to repro and debug the problematic scenario I'm seeing in production with MySqlConnector to understand it a little better.

bgrainger · 2020-09-10T03:21:57Z

One thing about await task; is that if task is already completed, its TaskAwaiter.IsCompleted will return true, and the compiler-generated async state machine will continue immediately without using a continuation (that is, INotifyCompletion.OnCompleted) to schedule the rest of the async method.

Any continuations (for that completed Task) will run after all the synchronous code following the await has executed. Thus, I wonder if you've considered running EndSegment synchronously if the Task has already completed? I.e., this change: bgrainger@e72df21

There's of course a small race condition there (with task completion) but any non-completed Task that completes immediately following the test will be be safely handled by the else blocks that will call ContinueWith.

I'm not positive that this is the root of the issue I'm observing in production, but I believe it would help with more accurate timings. Is it something you've tried or tested in the past?

Even if task hasn't completed, TaskAwaiter.OnCompleted will schedule the await continuation before all other continuations (added with Task.ContinueWith). So the method will resume (and run an unbounded amount of synchronous code) before the call to EndSegment (scheduled by ContinueWith) is executed. This is probably also causing timing issues, but I'm not sure how best to resolve it; it doesn't seem possible to force a user-supplied continuation to run (on the ASP.NET synchronization context) prior to await resuming. (edit: see below)

Is it better for us performance-wise to leverage the available synchronization context when registering our continuations?

It feels like this would at least reduce pressure on, or demand for, the shared thread pool. However, as discussed above, it can delay the continuations for an arbitrary period of time.

Should we always bypass the available synchronization context and attempt to execute our continuation synchronously?

I'm going to say a tentative "yes" here. Based on my local experimentation (in a simple ASP.NET method with a mix of await on async and sync methods, and calling ContinueWith on the tasks being awaited), the action scheduled with ContinueWith(action, TaskContinuationOptions.ExecuteSynchronously) executes immediately on the same thread, before the code awaiting that task resumes.

I think I would recommend deleting all this code:

newrelic-dotnet-agent/src/Agent/NewRelic/Agent/Extensions/NewRelic.Agent.Extensions/Providers/Wrapper/Delegates.cs

Lines 162 to 167 in 368af3c

    
           var context = SynchronizationContext.Current; 
        
           if (context != null) 
        
           { 
        
               task.ContinueWith(EndSegment, TaskScheduler.FromCurrentSynchronizationContext()); 
        
           } 
        
           else

bgrainger · 2020-09-10T19:33:15Z

I made both my suggested changes here (bgrainger@120f525) and deployed a custom build of NewRelic.Agent.Extensions.dll to production (didn't change the other Agent files).

The immediate results look promising:

The MySQL timing that greatly exceeds the response time has gone away. (I copied the files by hand between 12:05 and 12:15.)

Meanwhile, throughput for DB operations seems with normal ranges, so the operations are still being tracked:

nrcventura · 2020-09-10T20:17:01Z

We think that those are good changes to make, and will run some additional tests based on some async timing issues that we've run into in the past. In particular there have been some timing issues related to the combination of WebAPI and HttpClient that we'll be checking.

bgrainger · 2020-09-10T21:36:08Z

Opened a PR for this here: #197

This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).

nrcventura · 2020-09-11T01:11:31Z

@bgrainger, @nr-ahemsath has recently done some work to enable people outside of New Relic to run integration tests for our database instrumentation. Are you interested in helping us test out this experience by also helping to build some integration tests for this instrumentation? If so I can provide some more specifics about what's involved in adding the integration tests, and @nr-ahemsath will be able to go over how to get things setup, running, and interpret the results.

bgrainger · 2020-09-11T01:32:51Z

I saw better test support for MySQL get merged recently (#193); I hadn't tried it out yet but yes, I'm willing to build out some integration tests for MySqlConnector.

angelatan2 · 2020-09-15T20:47:50Z

Hello @bgrainger , Thank you for contributing to the New Relic .NET agent! We appreciate the discussions, suggestions, as well as the code contribution. It's interesting to see the exchange. We would like to send our early contributors a little token of appreciation. If you are interested, please do reach out to me at [email protected]. There is no obligation, of course. I will say "Thank you" here in any case, for your contribution.

elucus added enhancement New feature or request help wanted Extra attention is needed labels Aug 4, 2020

bgrainger added a commit to bgrainger/newrelic-dotnet-agent that referenced this issue Sep 7, 2020

Instrument MySqlConnector. Fixes newrelic#85

c77fec0

This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).

bgrainger mentioned this issue Sep 7, 2020

Instrument MySqlConnector #180

Merged

bgrainger added a commit to bgrainger/newrelic-dotnet-agent that referenced this issue Sep 7, 2020

Instrument MySqlConnector. Fixes newrelic#85

18b79f8

This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).

bgrainger added a commit to bgrainger/newrelic-dotnet-agent that referenced this issue Sep 8, 2020

Instrument MySqlConnector. Fixes newrelic#85

3107e8c

This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).

bgrainger mentioned this issue Sep 10, 2020

Change how EndSegment is called for Tasks #197

Merged

nrcventura closed this as completed in #180 Sep 11, 2020

nrcventura pushed a commit that referenced this issue Sep 11, 2020

Instrument MySqlConnector. Fixes #85 (#180)

775d796

This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).

nrcventura mentioned this issue Sep 11, 2020

Add integration tests for MySqlConnector instrumentation #200

Closed

bgrainger mentioned this issue Sep 11, 2020

Async Continuation pattern fails in certain scenarios (ex, Asp.Net pipeline) #189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MySqlConnector #85

Support MySqlConnector #85

bgrainger commented Aug 3, 2020

elucus commented Aug 4, 2020

bgrainger commented Aug 4, 2020

elucus commented Aug 4, 2020

nrcventura commented Aug 5, 2020

bgrainger commented Aug 5, 2020

nrcventura commented Aug 6, 2020

nrcventura commented Aug 6, 2020

bgrainger commented Aug 6, 2020

bgrainger commented Sep 4, 2020 •

edited

Loading

bgrainger commented Sep 7, 2020

nrcventura commented Sep 8, 2020

bgrainger commented Sep 9, 2020

bgrainger commented Sep 10, 2020

bgrainger commented Sep 10, 2020

nrcventura commented Sep 10, 2020

bgrainger commented Sep 10, 2020

nrcventura commented Sep 11, 2020

bgrainger commented Sep 11, 2020

angelatan2 commented Sep 15, 2020

Support MySqlConnector #85

Support MySqlConnector #85

Comments

bgrainger commented Aug 3, 2020

Feature Description

Priority

elucus commented Aug 4, 2020

bgrainger commented Aug 4, 2020

elucus commented Aug 4, 2020

nrcventura commented Aug 5, 2020

bgrainger commented Aug 5, 2020

nrcventura commented Aug 6, 2020

nrcventura commented Aug 6, 2020

bgrainger commented Aug 6, 2020

bgrainger commented Sep 4, 2020 • edited Loading

bgrainger commented Sep 7, 2020

nrcventura commented Sep 8, 2020

bgrainger commented Sep 9, 2020

bgrainger commented Sep 10, 2020

bgrainger commented Sep 10, 2020

nrcventura commented Sep 10, 2020

bgrainger commented Sep 10, 2020

nrcventura commented Sep 11, 2020

bgrainger commented Sep 11, 2020

angelatan2 commented Sep 15, 2020

bgrainger commented Sep 4, 2020 •

edited

Loading