-
Notifications
You must be signed in to change notification settings - Fork 198
OTA-1531: Rework error handling in FeatureGate processing #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
OTA-1531: Rework error handling in FeatureGate processing #1206
Conversation
@petr-muller: This pull request references OTA-1531 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@petr-muller: This pull request references OTA-1531 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: petr-muller The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- The `FeatureGate` GET is now made over an informer client which is backed by a cache, so polling to smooth over non-isNotFound errors no longer makes sense. The GET runs after a cache sync succeeded earlier so no errors are expected (and if an error happens, it is unlikely to be resolved by a retry) - Waiting for cache sync was not bounded; impose a 30s timeout on startup (this is similar to the deadline imposed by the poll before we started to use the informer client)
e6ff14c
to
8357a33
Compare
@petr-muller: This pull request references OTA-1531 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
||
for key, synced := range configInformerFactory.WaitForCacheSync(ctx.Done()) { | ||
if !synced { | ||
return startingFeatureSet, cvoGates, fmt.Errorf("failed to sync %s informer cache: %v", key.String(), ctx.Err()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
return startingFeatureSet, cvoGates, fmt.Errorf("failed to sync %s informer cache: %v", key.String(), ctx.Err()) | |
return startingFeatureSet, cvoGates, fmt.Errorf("failed to sync %s informer cache: %w", key.String(), ctx.Err()) |
featureGates := configInformerFactory.Config().V1().FeatureGates().Lister() | ||
configInformerFactory.Start(ctx.Done()) | ||
configInformerFactory.WaitForCacheSync(ctx.Done()) | ||
|
||
ctx, cancel := context.WithTimeout(ctx, 30*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to understand your reasoning on the number:
30s
here are WaitForCacheSync
(maybe 5s
?) + poll timeout (25s
)?
return "", cvoGates, err | ||
gate, err := featureGates.Get("cluster") | ||
switch { | ||
case err != nil && apierrors.IsNotFound(err): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: non-nil checking seems redundant:
cluster-version-operator/vendor/k8s.io/apimachinery/pkg/api/errors/errors.go
Lines 526 to 528 in 25f02d1
// IsNotFound returns true if the specified error was created by NewNotFound. | |
// It supports wrapped errors and returns false when the error is nil. | |
func IsNotFound(err error) bool { |
case err != nil && apierrors.IsNotFound(err): | |
case apierrors.IsNotFound(err): |
@petr-muller: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
FeatureGate
GET is now made over an informer client which is backed by a cache, so polling to smooth over non-isNotFound errors no longer makes sense. The GET runs after a cache sync succeeded earlier so no errors are expected (and if an error happens, it is unlikely to be resolved by a retry)