-
Notifications
You must be signed in to change notification settings - Fork 4.5k
xdsclient: make sending requests more deterministic #7774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7774 +/- ##
==========================================
+ Coverage 81.32% 81.34% +0.01%
==========================================
Files 368 368
Lines 36779 36782 +3
==========================================
+ Hits 29912 29919 +7
+ Misses 5635 5626 -9
- Partials 1232 1237 +5
|
Glad this fixes ADS Stream flake, and glad we got a temporary fix in for the CDS processing stuff. All these tests/handling of resources will be moved the resolver eventually, but looks like it'll be next year since I don't see it on our OKR's anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
// watch timers are started for the resources in the request. | ||
// | ||
// Caller needs to hold c.mu. | ||
func (s *StreamImpl) sendMessageIfWritePending(stream transport.StreamingCall, typ xdsresource.Type, state *resourceTypeState) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sendMessageIfWritePendingLocked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
if s.logger.V(2) { | ||
s.logger.Infof("Re-requesting resources %v of type %q, as the stream has been recreated", names, typ.TypeURL()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be put on line 381 now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to get rid of this log message since the resources are anyway logged from sendMessageLocked
.
This PR fixes a flaky test https://github.com/grpc/grpc-go/actions/runs/11499180098/job/32006581739?pr=7773, where the following happens:
But the asynchronous behavior of cluster_resolver means that CDS LB policy considers the local processing of the response containing the child clusters to be complete, and therefore invokes the done callback on the xds client to unblock the receive side flow control on the ADS stream. This requires a fix in the cluster_resolver LB policy.
But the above failure also uncovered an issue with the new ADS stream implementation. Consider the following sequence of events:
What this PR does:
pending_write
bit to the resource-type specific state maintained by the ADS stream implementation#a71-xds-fallback
#xdsclient-refactor
Addresses #6902
RELEASE NOTES: none