Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support MySqlConnector #85

Closed
bgrainger opened this issue Aug 3, 2020 · 19 comments · Fixed by #180
Closed

Support MySqlConnector #85

bgrainger opened this issue Aug 3, 2020 · 19 comments · Fixed by #180
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bgrainger
Copy link
Contributor

MySqlConnector is an advanced ADO.NET library for MySQL that supports asynchronous operations. (The "standard" MySQL Connector/NET package does not.)

Feature Description

Supporting this library out-of-the-box is a popular feature request for New Relic customers.

There is custom instrumentation XML for MySqlConnector, but as discussed in in NR Support case 373822 (NOTE: private link) and demonstrated in this repo, when it's used, database timings are doubled:

This trace shows a definite problem with the database timings. Specifically, almost all database calls are logged as extending to the end of the transaction, e.g., 164.66ms then 162.23ms, then 160.97ms, then 158.94ms, etc. (They should be something more like 2.4ms, 1.3ms, 2.0ms, etc.)

Summing up all these incorrect timings gives an overall incorrect picture of how long the API is spending in database transactions.

According to that support ticket:

So, after diving more deeply into the code and your repro our dev team has determined the agent code behind the SqlCommandTracerAsync tracer does not work correctly with the MySqlConnector driver and proper support will require additional investigation, development, testing, etc. There isn't a quick fix here :-(

Now that newrelic-dotnet-agent is Open Source, is it possible the community could help develop this fix? What are the specifics of how the "tracer does not work correctly", and what would be the correct approach to fixing it?

Priority

This is a "Must Have" for us. We use MySqlConnector for its async I/O capabilities, but all database timings are greatly inflated in the NR UI, making it useless for actually diagnosing DB issues.

@elucus
Copy link
Contributor

elucus commented Aug 4, 2020

Hi @bgrainger, thanks for your interest in the .NET agent. We believe that support for the MySQLConnector would be a valuable addition. However, at this point, we are unable to commit to a timeline to resolution for this feature. From your comments, it sounds like perhaps you’d be interested in contributing. If this is the case, please let us know, and we can provide some direction to get you started. Additionally, please let us know what database you’re using the MySQLConnector with.

@elucus elucus added enhancement New feature or request help wanted Extra attention is needed labels Aug 4, 2020
@bgrainger
Copy link
Contributor Author

From your comments, it sounds like perhaps you’d be interested in contributing. If this is the case, please let us know, and we can provide some direction to get you started.

@elucus Yes, this is the case. Any information you can provide on troubleshooting/debugging you've already performed and suggestions/ideas for the right path to a resolution would be greatly appreciated!

Additionally, please let us know what database you’re using the MySQLConnector with.

It's a variety of databases, including Percona 5.7 on SmartOS, Google Cloud SQL (5.7), Azure Database for MySQL (5.7 and 8.0).

@elucus
Copy link
Contributor

elucus commented Aug 4, 2020

Hi @bgrainger, great to hear. I'll talk with the team and we'll start working on some pointers to get you going.

@nrcventura
Copy link
Member

Hi @bgrainger, I haven't had enough time to dig into the issue enough to know exactly where the problem is, but I have a few ideas, and can try to get you pointed in an initial direction. I'm assuming you're able to build the agent locally so that you're able to do some debugging. If not, please let me know so that I can follow up with more instructions on how to get that working.

There are two files of interest that control the instrumentation.

  1. Instrumentation.xml which is what instructs the native profiler (the part of the agent that is responsible for manipulating IL so that we can "wrap" method calls) to instrument a given method. All of our SQL instrumentation is configured in this file. MySqlConnector.xml is another example of this instrumentation configuration.
  2. SqlCommandWrapper.cs is the code that gets executed when our SQL instrumentation is invoked.

Based on what you described, it sounds like either the segment that's created in SqlCommandWrapper.cs, is not ending until after the entire transaction ends. This can happen for a couple of reasons.

  1. Some problem occurred so our instrumentation never calls segment.end (sometimes an exception occurs, some cast operation fails, we lost track of the segment or transaction, etc.)
  2. The current synchronization context causes our continuation based timing to execute after the request ends (I forget which frameworks lead to this behavior, but some specify a synchronization context that cause only 1 thread to operate on a given request at a time).

Since the MySqlConnector.xml file specifies that the tracer name is SqlCommandTracerAsync and the method being instrumented returns a Task<MySqlDataReader>, the agent's SqlCommandAsyncWrapper should be the wrapper that gets executed. This wrapper will start a new segment when the ExecuteReaderAsync method is invoked in the MySqlConnector library. When that method returns a Task<MySqlDataReader> we register a continuation on it so that we can end the segment when the operation completes as opposed to when this method returns.

Hopefully you'll be able to debug into this code to provide some additional insight or discover what the problem is.

If you have any questions feel free to reach out to me.

@bgrainger
Copy link
Contributor Author

I can build both the managed and native solutions fine. I can trace through the code in SqlCommandWrapper.cs and Delegates.cs, but it's not immediately clear to me how I should go about debugging a "realistic" instrumentation scenario to see where things might be going awry; would that involve setting the COR_PROFILER environment environment for a sample app?

@nrcventura
Copy link
Member

That's great that you're able to build the agent, and you are correct that you will need to setup some environment variables for a sample app. Now that it's built there are some directories that were created that contain the agent in 4 different configurations. You'll need to configure some environment variables to refer to one of those directories depending on the test app that you're going to test with. An overview of the directories and environment variables can be found in https://github.com/newrelic/newrelic-dotnet-agent/blob/main/docs/development.md. I'm assuming that you'll be testing using the sample app that you previously created, and if so you'll probably need to setup the environment variables to use the newrelichome_x64 folder. You'll know that things are setup correctly if the logs folder within that folder populates and includes both a profiler log and an agent log. A profiler log looks like NewRelic.Profiler.232.log where 232 is the processId. An agent log looks like newrelic_agent_InstrumentedWebApp.log where InstrumentedWebApp is the configured application name for the instrumented app.

Once that's working you'll be able to debug the agent using Visual Studio by attaching the debugger of the Visual Studio instance containing the New Relic agent solution (not the instance that contains your application's code) to the running process. From there you should be able to set break points in the agent solution and be able to step through the code.

@nrcventura
Copy link
Member

@bgrainger, let me know if it would be useful to arrange a screen share to help with some initial debugging?

@bgrainger
Copy link
Contributor Author

Thanks for the pointers! I'll try to get up and running using your instructions soon and ping you if I get stuck.

@bgrainger
Copy link
Contributor Author

bgrainger commented Sep 4, 2020

I have been able to build a simple .NET 5 web api, attach VS, and debug the Agent.

However, my sample app that I previously provided to NR support (https://github.com/bgrainger/DatabaseTimingTest) doesn't seem to reproduce the DB timing problem I have seen in the past, so I'll need to work on getting a consistent repro locally that I can debug.

Update: Building my test app for .NET 4.7.2 and hosting it under IIS does appear to repro the problem, so the next step will be debugging it.

@bgrainger
Copy link
Contributor Author

What I see happening is that OnSuccess<T> for a MySqlConnector Task calls ContinueWith (with EndSegment) but that continuation isn't executed until all MySqlConnector work is done. I think this might have something to do with how AspNetSynchronizationContext queues task continuations onto one long chain, but I haven't figured out the specific details.

However, I did determine that if I just instrument the public API methods (instead of one Task-returning implementation method), everything seems to be measured as I would expect. I'll open a PR for this.

bgrainger added a commit to bgrainger/newrelic-dotnet-agent that referenced this issue Sep 7, 2020
This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).
bgrainger added a commit to bgrainger/newrelic-dotnet-agent that referenced this issue Sep 7, 2020
This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).
bgrainger added a commit to bgrainger/newrelic-dotnet-agent that referenced this issue Sep 8, 2020
This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).
@nrcventura
Copy link
Member

I'm glad that you were able to notice the continuation-based timing problem which happens because of the AspNetSynchronizationContext. We've run into that before but have not fixed it across all of our instrumentation because we want to understand a few things better.

  1. Is it better for us performance-wise to leverage the available synchronization context when registering our continuations?
  2. Should we always bypass the available synchronization context and attempt to execute our continuation synchronously?

We addressed this problem within our Stack Exchange Redis instrumentation with this call to Delegates.GetAsyncDelegateFor. There are some explanations in the comments that explain our thought process.

@bgrainger
Copy link
Contributor Author

Thanks; that link (and code comment) helps.

I still need to repro and debug the problematic scenario I'm seeing in production with MySqlConnector to understand it a little better.

@bgrainger
Copy link
Contributor Author

One thing about await task; is that if task is already completed, its TaskAwaiter.IsCompleted will return true, and the compiler-generated async state machine will continue immediately without using a continuation (that is, INotifyCompletion.OnCompleted) to schedule the rest of the async method.

Any continuations (for that completed Task) will run after all the synchronous code following the await has executed. Thus, I wonder if you've considered running EndSegment synchronously if the Task has already completed? I.e., this change: bgrainger@e72df21

There's of course a small race condition there (with task completion) but any non-completed Task that completes immediately following the test will be be safely handled by the else blocks that will call ContinueWith.

I'm not positive that this is the root of the issue I'm observing in production, but I believe it would help with more accurate timings. Is it something you've tried or tested in the past?

Even if task hasn't completed, TaskAwaiter.OnCompleted will schedule the await continuation before all other continuations (added with Task.ContinueWith). So the method will resume (and run an unbounded amount of synchronous code) before the call to EndSegment (scheduled by ContinueWith) is executed. This is probably also causing timing issues, but I'm not sure how best to resolve it; it doesn't seem possible to force a user-supplied continuation to run (on the ASP.NET synchronization context) prior to await resuming. (edit: see below)

  1. Is it better for us performance-wise to leverage the available synchronization context when registering our continuations?

It feels like this would at least reduce pressure on, or demand for, the shared thread pool. However, as discussed above, it can delay the continuations for an arbitrary period of time.

  1. Should we always bypass the available synchronization context and attempt to execute our continuation synchronously?

I'm going to say a tentative "yes" here. Based on my local experimentation (in a simple ASP.NET method with a mix of await on async and sync methods, and calling ContinueWith on the tasks being awaited), the action scheduled with ContinueWith(action, TaskContinuationOptions.ExecuteSynchronously) executes immediately on the same thread, before the code awaiting that task resumes.

I think I would recommend deleting all this code:

var context = SynchronizationContext.Current;
if (context != null)
{
task.ContinueWith(EndSegment, TaskScheduler.FromCurrentSynchronizationContext());
}
else

@bgrainger
Copy link
Contributor Author

I made both my suggested changes here (bgrainger@120f525) and deployed a custom build of NewRelic.Agent.Extensions.dll to production (didn't change the other Agent files).

The immediate results look promising:

image

The MySQL timing that greatly exceeds the response time has gone away. (I copied the files by hand between 12:05 and 12:15.)

Meanwhile, throughput for DB operations seems with normal ranges, so the operations are still being tracked:

image

@nrcventura
Copy link
Member

We think that those are good changes to make, and will run some additional tests based on some async timing issues that we've run into in the past. In particular there have been some timing issues related to the combination of WebAPI and HttpClient that we'll be checking.

@bgrainger
Copy link
Contributor Author

Opened a PR for this here: #197

nrcventura pushed a commit that referenced this issue Sep 11, 2020
This supports the namespaces for both MySqlConnector v0.x (MySql.Data.MySqlClient) and v1.x (MySqlConnector).
@nrcventura
Copy link
Member

@bgrainger, @nr-ahemsath has recently done some work to enable people outside of New Relic to run integration tests for our database instrumentation. Are you interested in helping us test out this experience by also helping to build some integration tests for this instrumentation? If so I can provide some more specifics about what's involved in adding the integration tests, and @nr-ahemsath will be able to go over how to get things setup, running, and interpret the results.

@bgrainger
Copy link
Contributor Author

I saw better test support for MySQL get merged recently (#193); I hadn't tried it out yet but yes, I'm willing to build out some integration tests for MySqlConnector.

@angelatan2
Copy link
Contributor

Hello @bgrainger , Thank you for contributing to the New Relic .NET agent! We appreciate the discussions, suggestions, as well as the code contribution. It's interesting to see the exchange. We would like to send our early contributors a little token of appreciation. If you are interested, please do reach out to me at [email protected]. There is no obligation, of course. I will say "Thank you" here in any case, for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants