-
Notifications
You must be signed in to change notification settings - Fork 4.5k
xdsclient: update watcher API as per gRFC A88 #7977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8d198b7
to
a9b45c0
Compare
4009e3e
to
57dbf23
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7977 +/- ##
==========================================
+ Coverage 82.16% 82.21% +0.05%
==========================================
Files 410 412 +2
Lines 40248 40487 +239
==========================================
+ Hits 33068 33287 +219
- Misses 5830 5839 +9
- Partials 1350 1361 +11
🚀 New features to boost your workflow:
|
43c9adb
to
89f475a
Compare
89f475a
to
a48b5bb
Compare
If we are talking about the case where a NACK is seen without the resource being cached, then the question of not using the resource anymore does not come into the picture, because the resource does not exist yet. |
While making another pass through this, I really think we should get rid of these error types, or at least look into getting rid of them. The reason behind my proposal is that when a watcher gets a We don't have to make that change as part of this PR, but I think you should add it to your list of things that you have that need to be done as part of the generic xDS client changes. |
yes, I have added. Thanks. |
yeah if watcher has not seen the valid resource before then its not applicable. The gRFC talks about case when watcher has seen the valid resource but it deleted in xDS client cache https://github.com/grpc/proposal/blob/master/A88-xds-data-error-handling.md#changes-to-xdsclient-watcher-apis. |
e9550d3
to
9b0f57e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM. Very minor comments this time around. But some action items are there. So, please let's make sure we have them tracked. Thanks.
@@ -474,20 +478,28 @@ func (b *cdsBalancer) onClusterUpdate(name string, update xdsresource.ClusterUpd | |||
// If the security config is invalid, for example, if the provider | |||
// instance is not found in the bootstrap config, we need to put the | |||
// channel in transient failure. | |||
b.onClusterError(name, b.annotateErrorWithNodeID(fmt.Errorf("received Cluster resource contains invalid security config: %v", err))) | |||
b.onClusterResourceError(name, b.annotateErrorWithNodeID(fmt.Errorf("received Cluster resource contains invalid security config: %v", err))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it that you unconditionally call onClusterResourceError
here while you conditionally call onClusterResourceError
or onClusterAmbientError
in the two cases below?
I would think of these errors as a data errors too. This is a class of errors which cannot be handled at the xDS client layer (because it doesn't have the required information to do so) but has to be handled at the respective watcher. There are some cases like this for the listener resource as well.
Could you please track this one as well, along with the other items that you are tracking. We should see if we can have a uniform story for this class of errors across languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the same check here as well. It doesn't harm? Though please note that we have finished updating the new lastUpdate with new update. So, would it make sense to its call ambient error here because we have the update stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though please note that we have finished updating the new lastUpdate with new update
Yeah, we could look into not updating that field in the cases where we rejecting the update in the cds LB policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the same code of checking if the child policy exists or not and calling one of the onClusterXxxError methods is repeated three times. We could pull it into a helper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -482,7 +482,7 @@ func (r *xdsResolver) onError(err error) { | |||
// are removed. | |||
// | |||
// Only executed in the context of a serializer callback. | |||
func (r *xdsResolver) onResourceNotFound() { | |||
func (r *xdsResolver) onResourceError(err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github doesn't allow me to comment on unchanged lines. But in method applyRouteConfigUpdate
, there is a call to onError
. I feel that this is also a data error and also belongs to the class of error I was talking about in the comment in xds/internal/balancer/cdsbalancer/cdsbalancer.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's kind of weird. I have added the TODO. Its again being called from resource update method which has finished applying the new update. I think we should call resource error here. is it because we have finished applying the new update so we have a cached resource? For now, I have kept it as ambient error to not break existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please link a github issue with full details in any TODO that is being added, unless you plan to act on the TODO right away in which case you can add your github handle to the todo. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added alias. It will be fast follow from okr tracker
55edef9
to
1e5c614
Compare
1e5c614
to
536bebe
Compare
This is the first part of implementing gRFC A88 (grpc/proposal#466).
This introduces the new watcher API. This table summarizes the API changes and behavior for each case:
OnResourceDoesNotExist()
OnResourceChanged(NOT_FOUND)
OnResourceDoesNotExist()
OnResourceChanged(NOT_FOUND)
OnError()
OnResourceChanged(status)
if resource NOT already cached;OnAmbientError(status)
otherwiseOnError()
OnResourceChanged(status)
if resource NOT already cached;OnAmbientError(status)
otherwiseOnError()
OnResourceChanged(status)
if resource NOT already cached;OnAmbientError(status)
otherwiseOnUpdate(resource)
OnResourceChanged(resource)
RELEASE NOTES: None
google3 import run analysis did not show any issues