Skip to content

Fixing the contention condition caused by RegisterResolversManifests #11079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 16, 2024

Conversation

SimaTian
Copy link
Member

@SimaTian SimaTian commented Dec 4, 2024

Fixes #11043, #7927

Context

There was a contention condition described here:
One thread enters and locks, then initializes a list, starts pushing things onto the list, which is now no longer null.
Second thread then checks, sees the list is not empty and bypasses the lock, acquires enumerator.
First thread pushes additional item into the list.
Second thread throws.

Changes Made

Initializing the list to a placeholder first, only assigning it when it is finished. Due to this, until the list is fully initialized, all threads have to enter the lock, then they will check that the list is now initialized and return.
This will result in some minor locking overhead, but it is not as strict as locking the collection for all reads.

Furthermore, since the collection is supposed to be read only as written here, I've made it into IReadOnlyList instead of the current IList.

Testing

I got a reproduction, although a bit clunky.
First commit checks should fail. Then I will push the fix which should succeed.

@SimaTian SimaTian force-pushed the SdkResolverService_race_fix branch 2 times, most recently from 532900a to 2cdfbf7 Compare December 6, 2024 09:59
@SimaTian SimaTian changed the title Fixing the race condition caused by RegisterResolversManifests Fixing the contention condition caused by RegisterResolversManifests Dec 6, 2024
@SimaTian SimaTian force-pushed the SdkResolverService_race_fix branch from 2cdfbf7 to d7826cd Compare December 6, 2024 10:18
Copy link

@donJoseLuis donJoseLuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments; man one moving away from using the thread class

Copy link
Member

@JanKrivanek JanKrivanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick pass.

There is lots of test related code in production code - can we isolate it inot tests

Copy link
Member

@JanKrivanek JanKrivanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the efforts to clean the production part of the code! I like this!!

@SimaTian SimaTian merged commit 3a63bca into main Dec 16, 2024
10 checks passed
@SimaTian SimaTian deleted the SdkResolverService_race_fix branch December 16, 2024 10:25
@KalleOlaviNiemitalo
Copy link

@baronfel
Copy link
Member

/backport to vs17.12

Copy link
Contributor

Started backporting to vs17.12: https://github.com/dotnet/msbuild/actions/runs/13973433101

SimaTian added a commit that referenced this pull request Mar 27, 2025
…11079)

* Contention condition reproduction via unit test

* updating the lists only after they're complete to avoid the contention.

* Update src/Build/BackEnd/Components/SdkResolution/SdkResolverService.cs

Co-authored-by: Rainer Sigwald <[email protected]>

* addressing review comments

* minor touchup

* #if DEBUG fix

* refactoring to get rid of #if directives

* removing unnecessary include

* variable rename

---------

Co-authored-by: Rainer Sigwald <[email protected]>
SimaTian added a commit that referenced this pull request May 23, 2025
…Manifests (#11612)

Fixes #11079 

Work item (Internal use): 
https://dev.azure.com/devdiv/DevDiv/_workitems/edit/2400156

Summary:
**Backport of #11079 to vs17.12**
There was a contention condition described #7927 (comment):
One thread enters and locks, then initializes a list, starts pushing
things onto the list, which is now no longer null.
Second thread then checks, sees the list is not empty and bypasses the
lock, acquires enumerator.
First thread pushes additional item into the list.
Second thread throws.

We want to backport it to 17.12 (which is 9.0.100) so it is a part of
the source build.
That means that a large part of our Linux userbase which runs on source
builds SDKs is exposed to the problem and they could experience failures
at any time - and they won't have the fix unless we backport.



Customer Impact:
It is an annoying error that occasionally kills a multiprocessed
pipeline due to a contention condition.
Fortunately it only manifests occassionaly, however it happens often
enough to be noticeable. We have at least two well documented cases as
of recently and even more if we extend the time window.

Regression: 
No, it is a long standing bug.

Testing
src/Build.UnitTests/BackEnd/SdkResolverService_Tests.cs
test to validate the error is no longer there. (See the original ticket
for discussion)

Risk
Low, the change is already rolled out and tested in the main branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collection was modified in SdkResolverService.ResolveSdkUsingResolversWithPatternsFirst
6 participants