Skip to content

unrecoverable error - A concurrent update was performed on this collection and corrupted its state #8299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
golfalot opened this issue May 20, 2025 · 5 comments

Comments

@golfalot
Copy link

Product

Hot Chocolate

Version

12.22.6.0

Link to minimal reproduction

_operationContext.Scheduler.Register(
CollectionsMarshal.AsSpan(_taskBuffer));

Steps to reproduce

ref issue I raised with DAB that appears to point the finger at Hot Chocolate Azure/data-api-builder#2694

Background
When using the /graphql in DAB, once this error has occurred just once, it persistently errors for the Entities (source tables) until DAB application is restarted.

Research
Apologies, I'm not sufficiently skilled to repro the issue, but I'm doing by hardest to show some effort was put into documenting the issue and possible root cause

I've walking the code trying to understand where a concurrency issue might occur and this area appears to have potential for causing the Exception. I won't embarrass myself but putting in some AI suggestions for a fix as I don't understand the optimisations at work here, but Claude explains to me that

When you use CollectionsMarshal.AsSpan(_taskBuffer), you're getting direct, low-level access to the memory of the collection. This is a performance optimization, but it comes with a major caveat: it bypasses the normal thread-safety mechanisms of collection classes.

The exception message "Operations that change non-concurrent collections must have exclusive access" indicates that:

_ taskBuffer is a non-concurrent collection (like a standard List or array)
The collection is being modified from multiple threads simultaneously

Here are the links to the relevant code

private readonly List<ResolverTask> _taskBuffer = [];

_operationContext.Scheduler.Register(
CollectionsMarshal.AsSpan(_taskBuffer));

This is the line referenced in the stack trace

What is expected?

  • not throw exceptions during concurrent requests (fix root cause)
  • if not fixable, after such an error, clean up and not persist "The collection's state is no longer correct." through to subsequent gql queries

What is actually happening?

For those Entities that errored, they cannot be queried successfully until the application is restarted,
Note that non errored Entities can still be successfully returned in the `"data":¬ field of the json (not included here)

"errors": [
        {
            "message": "Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.",
            "locations": [
                {
                    "line": 46,
                    "column": 3
                }
            ],
            "path": [
                "copernicusSlope_by_pk"
            ]
        },
        {
            "message": "Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.",
            "locations": [
                {
                    "line": 102,
                    "column": 3
                }
            ],
            "path": [
                "hadUKgroundfrost_by_pk"
            ]
        },
...

Relevant log output

[{
        "severityLevel": "Error",
        "outerId": "0",
        "message": "Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.",
        "type": "System.InvalidOperationException",
        "id": "60417033",
        "parsedStack": [{
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported",
                "level": 0,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Collections.Generic.Dictionary`2.FindValue",
                "level": 1,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Core.Resolvers.CosmosQueryEngine+<GetPartitionKeyPath>d__16.MoveNext",
                "level": 2,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 3,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 4,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 5,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Core.Resolvers.CosmosQueryEngine+<GetIdAndPartitionKey>d__17.MoveNext",
                "level": 6,
                "line": 344,
                "fileName": "/_/src/Core/Resolvers/CosmosQueryEngine.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 7,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 8,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 9,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Core.Resolvers.CosmosQueryEngine+<ExecuteAsync>d__8.MoveNext",
                "level": 10,
                "line": 81,
                "fileName": "/_/src/Core/Resolvers/CosmosQueryEngine.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 11,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 12,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 13,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "Azure.DataApiBuilder.Service.Services.ExecutionHelper+<ExecuteQueryAsync>d__5.MoveNext",
                "level": 14,
                "line": 79,
                "fileName": "/_/src/Core/Services/ExecutionHelper.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 15,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 16,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 17,
                "line": 0
            }, {
                "assembly": "Azure.DataApiBuilder.Core, Version=1.4.27.0, Culture=neutral, PublicKeyToken=null",
                "method": "ResolverTypeInterceptor+<>c__DisplayClass5_1+<<-ctor>b__5>d.MoveNext",
                "level": 18,
                "line": 23,
                "fileName": "/_/src/Core/Services/ResolverTypeInterceptor.cs"
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 19,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 20,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 21,
                "line": 0
            }, {
                "assembly": "HotChocolate.Execution, Version=12.22.6.0, Culture=neutral, PublicKeyToken=null",
                "method": "HotChocolate.Execution.Processing.Tasks.ResolverTask+<ExecuteResolverPipelineAsync>d__58.MoveNext",
                "level": 22,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
                "level": 23,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
                "level": 24,
                "line": 0
            }, {
                "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
                "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
                "level": 25,
                "line": 0
            }, {
                "assembly": "HotChocolate.Execution, Version=12.22.6.0, Culture=neutral, PublicKeyToken=null",
                "method": "HotChocolate.Execution.Processing.Tasks.ResolverTask+<TryExecuteAsync>d__57.MoveNext",
                "level": 26,
                "line": 0
            }
        ]
    }
]

Additional context

No response

@michaelstaib
Copy link
Member

The task buffer is owned by a resolver task which itself including the context is a pooled resource. It sounds that the context is passed on beyond the resolver and used outside of the task. This would in any case not be a safe thing to do.

@michaelstaib
Copy link
Member

How do you get to these lines from your stack trace? looks more like TryExecuteAsync raises the issue ... which would point to a concrete resolver execution.

@golfalot
Copy link
Author

How do you get to these lines from your stack trace? looks more like TryExecuteAsync raises the issue ... which would point to a concrete resolver execution.

The oddly formatted stack trace is because it comes from the Application Insights Exceptions table associated with the Azure Container App. Do let me know if you require anything another info. Thanks for looking into this!

@michaelstaib
Copy link
Member

I think this issue is a DAB issue to be honest. The code in question is guaranteed to run in a single thread and is owned by the ResolverTask which itself is a pooled instance. Within the stack trace we are not in the complete phase but in the execute phase so the referenced code in this issue is not relevant. Also following this thing down to the cosmos engine the shared metastore is Dictionary which is not thread-safe. I will ping the DAB team.

@michaelstaib
Copy link
Member

Jap ... looks like the metastore could be it ... "method": "System.Collections.Generic.Dictionary`2.FindValue", this is in your stack even.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants