Skip to content

delegatingresolver: Stop calls into delegates once the parent resolver is closed. #8195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 26, 2025

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Mar 24, 2025

Fixes: #8168

While handling updates from child resolvers, the delegating resolver locks it's mutex. If the channel returns an error on updating the state, the delegating resolver calls ResolveNow on the other child. This can result in a deadlock if the other child attempts to call cc.UpdateState synchronously while handling the ResolveNow call. This change makes delegatingResovler call ResolveNow in a new goroutine to avoid such deadlocks.

This change introduced a second mutex in delegating resolver to serialize calls into child resolvers. This ensures no calls to ResolveNow are made after the child resolvers are closed.

RELEASE NOTES:

  • resolver/delegatingresolver: Avoid deadlocks when an http proxy is configured and a resolver calls UpdateState synchronously while handling a ResolveNow call.
  • resolver/delegatingresolver: Fix race that may cause panics when an http proxy is configured and the resolver is closing.

Copy link

codecov bot commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.08%. Comparing base (b0d1203) to head (4cda2d6).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8195      +/-   ##
==========================================
+ Coverage   82.06%   82.08%   +0.02%     
==========================================
  Files         410      410              
  Lines       40233    40233              
==========================================
+ Hits        33018    33027       +9     
+ Misses       5854     5848       -6     
+ Partials     1361     1358       -3     
Files with missing lines Coverage Δ
.../resolver/delegatingresolver/delegatingresolver.go 82.22% <100.00%> (+5.53%) ⬆️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal changed the title delegatingresolver: Block calls into delegates once the parent resolver is closed. delegatingresolver: Drop calls into delegates once the parent resolver is closed. Mar 24, 2025
@arjan-bal arjan-bal force-pushed the fix-delegating-resolver-race branch from f593348 to d62e1ce Compare March 24, 2025 20:43
@arjan-bal arjan-bal removed the request for review from easwars March 25, 2025 02:01
@@ -93,6 +97,11 @@ func New(target resolver.Target, cc resolver.ClientConn, opts resolver.BuildOpti
r := &delegatingResolver{
target: target,
cc: cc,
// Child resolvers may send a state update as soon as they're built.
// Initialize children with no-op resolvers. When both children are nil,
// the delegatingResolver is considered closed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: do we not need to hold childMu when calling Build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to. Updates from one child can trigger ResolveNow calls in the other child. This also avoids the need to initialize children here. Added code to acquire the mutex and release it.


// childMu serializes calls into child resolvers. It also protects access to
// the following fields.
childMu sync.Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not share mu for this? Why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

childMu is being used to serialize calls into the child resolvers, similar to the serailzer queue in resolver_wrapper.go.

If we re-use the existing mutex, we would need to hold mu while calling childResolver.ResolveNow(). If this calls cc.UpdateState() synchronously, it will try to re-acquire mu and deadlock.

  • mu is used to serialize resolver -> channel calls.
  • childMu is used to serialize channel -> resolver calls.

go func() {
r.childMu.Lock()
defer r.childMu.Unlock()
if !r.isClosedLocked() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a nit, but IMO it is simpler to delete the helper and here just check if r.targetResolver != nil since that's what really matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the helper and added a check.

go func() {
r.childMu.Lock()
defer r.childMu.Unlock()
if !r.isClosedLocked() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the helper function.

@dfawley dfawley assigned arjan-bal and unassigned dfawley Mar 25, 2025
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Mar 26, 2025
@arjan-bal arjan-bal assigned arjan-bal and unassigned dfawley Mar 26, 2025
@arjan-bal arjan-bal changed the title delegatingresolver: Drop calls into delegates once the parent resolver is closed. delegatingresolver: Stop calls into delegates once the parent resolver is closed. Mar 26, 2025
@arjan-bal arjan-bal added Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. and removed Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. labels Mar 26, 2025
@arjan-bal arjan-bal merged commit 6819ed7 into grpc:master Mar 26, 2025
15 checks passed
@arjan-bal arjan-bal deleted the fix-delegating-resolver-race branch March 26, 2025 17:00
arjan-bal added a commit to arjan-bal/grpc-go that referenced this pull request Mar 26, 2025
arjan-bal added a commit to arjan-bal/grpc-go that referenced this pull request Mar 26, 2025
arjan-bal added a commit that referenced this pull request Mar 26, 2025
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic seen in google.golang.org/grpc/internal/resolver/delegatingresolver.(*delegatingResolver).updateProxyResolverState
4 participants