Skip to content

Transaction can cause timeout, lost server connection, and connection pool saturation on SQL Server #533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Arkatufus opened this issue May 5, 2025 · 12 comments

Comments

@Arkatufus
Copy link
Contributor

Arkatufus commented May 5, 2025

Version Information
Version of Akka.Persistence.Sql? dev branch

Describe the bug
Transaction can cause timeout, zombie connection, and/or connection pool exhaustion on SQL Server during persistence HandleDeleteMessagesTo execution with high persistence activity. This seems to be caused by table/row lock and the complexity of the operation.

@Arkatufus
Copy link
Contributor Author

We will need to analyze the operation complexity and find a more efficient way of doing bulk deletes

@Arkatufus
Copy link
Contributor Author

Using Rider Dynamic Program Analysis (DPA), these issues were found:

Image

Image

Image

Image

@Arkatufus
Copy link
Contributor Author

Looking at the "Average time" metrics, it is clear that the database queries themselves are quite simple and performant, ranging from 31ms to 400ms to execute.

But, looking at the "Max time among all execution", we're seeing some discrepancy. Some of these operations suddenly needs 5-6 seconds to execute, this suggests a table/row deadlock that stops the query execution entirely.

@Arkatufus Arkatufus changed the title Transaction can cause timeout on SQL Server 2016 Express Transaction can cause timeout, lost server connection, and connection pool saturation on SQL Server May 13, 2025
@Arkatufus
Copy link
Contributor Author

So the error occurred here:

catch (Exception ex1)
{
try
{
await tx.RollbackAsync(token);
}
catch (Exception ex2)
{
throw new AggregateException("Exception thrown when rolling back database transaction", ex2, ex1);
}
throw;

We're missing the second inner exception, but the first one indicates to me that the transaction got disposed when it shouldn't have somehow.

@Arkatufus
Copy link
Contributor Author

--> (Inner Exception #1) System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
   at Microsoft.Data.SqlClient.SqlTransaction.ZombieCheck()
   at Microsoft.Data.SqlClient.SqlTransaction.Commit()
   at LinqToDB.Async.AsyncDbTransaction.CommitAsync(CancellationToken cancellationToken)
   at LinqToDB.Data.DataConnection.<>c.<<CommitTransactionAsync>b__3_1>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at LinqToDB.Data.DataConnection.<TraceActionAsync>d__8`2.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at LinqToDB.Data.DataConnection.<CommitTransactionAsync>d__3.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at LinqToDB.Data.DataConnectionTransaction.<CommitAsync>d__7.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

looks like the transaction got hosed even before we tried to commit

@Arkatufus
Copy link
Contributor Author

@Arkatufus
Copy link
Contributor Author

Linq2Db transaction comitting code:

https://github.com/linq2db/linq2db/blob/bc50707294c09851487dd27406e27fb137a06521/Source/LinqToDB/DataContextTransaction.cs#L142-L164

Linq2Db transaction opening code:

https://github.com/linq2db/linq2db/blob/bc50707294c09851487dd27406e27fb137a06521/Source/LinqToDB/Data/DataConnection.Async.cs#L33-L72

This is where a second transaction can open on the same connection, aborting the original one in-progress

@Arkatufus
Copy link
Contributor Author

Probable cause

Previous theory that this is caused by backpressure problem, we might be applying query limiter/throttler to the wrong database query that it throttles recovery. This is proven to be impossible, both ReplayMessagesAsync() and DeleteMessagesAsync() doesn't use ExecuteQueryWithTransactionAsync(), which rules out query throttling misuse possibility.

Static code analysis:

Image

Image

@Arkatufus
Copy link
Contributor Author

Arkatufus commented May 13, 2025

Current Delete() operation flow:

VAR $persistenceId
VAR $maxSeqNo

Start Transaction

  SET j.Deleted = true
    FROM journal j
    WHERE j.PersistenceId = $persistenceId AND j.SequenceNumber < $maxSeqNo

  VAR $maxDeleted = 
    TAKE 1 SELECT j.SequenceNumber
      FROM journal j
      WHERE j.PersistenceId = $persistenceId AND j.SequenceNumber < $maxSeqNo
      ORDER BY DESC j.SequenceNumber

  DELETE FROM journal.j
    WHERE j.PersistenceId = $persistenceId AND j.SequenceNumber < maxDeleted

  DELETE FROM tags t
    WHERE t.PersistenceId = $persistenceId AND t.SequenceNumber <= $maxSeqNo

End Transaction

@Arkatufus
Copy link
Contributor Author

Arkatufus commented May 13, 2025

Fix attempt, optimize code:

VAR $persistenceId
VAR $maxSeqNo

VAR $maxDeleted = 
  TAKE 1 SELECT j.SequenceNumber
    FROM journal j
    WHERE j.PersistenceId = $persistenceId AND j.SequenceNumber < $maxSeqNo
    ORDER BY DESC j.SequenceNumber

Start Transaction

  SET j.Deleted = true
    FROM journal j
    WHERE j.PersistenceId = $persistenceId AND j.SequenceNumber = $maxDeleted

  DELETE FROM journal.j
    WHERE j.PersistenceId = $persistenceId AND j.SequenceNumber < $maxDeleted

  DELETE FROM tags t
    WHERE t.PersistenceId = $persistenceId AND t.SequenceNumber < $maxDeleted

End Transaction
  • Can cause a connection pool exhaustion if it is set to use 2 connection object
  • Using a single connection still causes timeout

@Arkatufus
Copy link
Contributor Author

Arkatufus commented May 13, 2025

Note

It would be great to be able to profile the server itself and actually see the occurence of these table deadlocks. Does anyone know of any tools that can help with this?

@Arkatufus
Copy link
Contributor Author

Note

Currently, Microsoft.Data.SqlClient does not support have any asynchronous transaction support,
it is using the default behavior of System.Data.Common which discards the CancellationToken
passed into the BeginDbTransactionAsync method call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant