Skip to content

OTA-1531: Rework error handling in FeatureGate processing #1206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

petr-muller
Copy link
Member

@petr-muller petr-muller commented Jun 20, 2025

  • The FeatureGate GET is now made over an informer client which is backed by a cache, so polling to smooth over non-isNotFound errors no longer makes sense. The GET runs after a cache sync succeeded earlier so no errors are expected (and if an error happens, it is unlikely to be resolved by a retry)
  • Waiting for cache sync was not bounded; impose a 30s timeout on startup (this is similar to the deadline imposed by the poll before we started to use the informer client)

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 20, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 20, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

  • The FeatureGate GET is now made over an informer client which is
    backed by a cache, so polling to smooth over non-isNotFound errors no
    longer makes sense. The GET runs after a cache sync succeeded earlier
    so no errors are expected (and if an error happens, it is unlikely to
    be resolved by a retry)
  • Waiting for cache sync was not bounded; impose a 30s timeout on startup
    (this is similar to the deadline imposed by the poll before we started
    to use the informer client)
  • Return pointers from processInitialFeatureGate to make return errors
    easier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jun 20, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

  • The FeatureGate GET is now made over an informer client which is backed by a cache, so polling to smooth over non-isNotFound errors no longer makes sense. The GET runs after a cache sync succeeded earlier so no errors are expected (and if an error happens, it is unlikely to be resolved by a retry)
  • Waiting for cache sync was not bounded; impose a 30s timeout on startup (this is similar to the deadline imposed by the poll before we started to use the informer client)
  • Return pointers from processInitialFeatureGate to make return errors easier

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Jun 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 20, 2025
- The `FeatureGate` GET is now made over an informer client which is
  backed by a cache, so polling to smooth over non-isNotFound errors no
  longer makes sense. The GET runs after a cache sync succeeded earlier
  so no errors are expected (and if an error happens, it is unlikely to
  be resolved by a retry)
- Waiting for cache sync was not bounded; impose a 30s timeout on startup
  (this is similar to the deadline imposed by the poll before we started
  to use the informer client)
@petr-muller petr-muller force-pushed the ota-1531-08-simplify-featuregate-get branch from e6ff14c to 8357a33 Compare July 3, 2025 17:52
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 3, 2025

@petr-muller: This pull request references OTA-1531 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

In response to this:

  • The FeatureGate GET is now made over an informer client which is backed by a cache, so polling to smooth over non-isNotFound errors no longer makes sense. The GET runs after a cache sync succeeded earlier so no errors are expected (and if an error happens, it is unlikely to be resolved by a retry)
  • Waiting for cache sync was not bounded; impose a 30s timeout on startup (this is similar to the deadline imposed by the poll before we started to use the informer client)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.


for key, synced := range configInformerFactory.WaitForCacheSync(ctx.Done()) {
if !synced {
return startingFeatureSet, cvoGates, fmt.Errorf("failed to sync %s informer cache: %v", key.String(), ctx.Err())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
return startingFeatureSet, cvoGates, fmt.Errorf("failed to sync %s informer cache: %v", key.String(), ctx.Err())
return startingFeatureSet, cvoGates, fmt.Errorf("failed to sync %s informer cache: %w", key.String(), ctx.Err())

featureGates := configInformerFactory.Config().V1().FeatureGates().Lister()
configInformerFactory.Start(ctx.Done())
configInformerFactory.WaitForCacheSync(ctx.Done())

ctx, cancel := context.WithTimeout(ctx, 30*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to understand your reasoning on the number:
30s here are WaitForCacheSync (maybe 5s?) + poll timeout (25s)?

return "", cvoGates, err
gate, err := featureGates.Get("cluster")
switch {
case err != nil && apierrors.IsNotFound(err):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: non-nil checking seems redundant:

// IsNotFound returns true if the specified error was created by NewNotFound.
// It supports wrapped errors and returns false when the error is nil.
func IsNotFound(err error) bool {

Suggested change
case err != nil && apierrors.IsNotFound(err):
case apierrors.IsNotFound(err):

Copy link
Contributor

openshift-ci bot commented Jul 3, 2025

@petr-muller: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-conformance 8357a33 link true /test e2e-hypershift-conformance
ci/prow/okd-scos-e2e-aws-ovn 8357a33 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-agnostic-operator-devpreview 8357a33 link false /test e2e-agnostic-operator-devpreview
ci/prow/e2e-aws-ovn-techpreview 8357a33 link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-hypershift 8357a33 link true /test e2e-hypershift

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants