-
Notifications
You must be signed in to change notification settings - Fork 245
OCPBUGS-54238: Update CSR status condition appropriately #2674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-54238: Update CSR status condition appropriately #2674
Conversation
@pperiyasamy: This pull request references Jira Issue OCPBUGS-54238, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest |
/assign @martinkennelly @trozet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
bd8d812
to
749d72a
Compare
@@ -88,6 +89,8 @@ func (r *ReconcileCSR) Reconcile(ctx context.Context, request reconcile.Request) | |||
if err != nil { | |||
if apierrors.IsNotFound(err) { | |||
// Request object not found, could have been deleted after reconcile request. | |||
// restore network status when CSR is deleted. | |||
r.status.SetNotDegraded(statusmanager.CertificateSigner) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there is more than one CSR that is failing and one of those gets removed?
Was it always broken like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @kyrtapz, the status gets updated again when next signing attempt happens for another CSR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see this as the way to go, what if a CSR signing fails for one object but another CSR expired in the meantime and got garbage collected? We would remove the error.
To me this changes how CNO behaved until now. There is a wider issue of the status being overwritten for every CSR but this is a different discussion.
Is this change still necessary with the other fix you did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize, I don't think we should change the CNO degraded state based on any particular CSR object, this should be reserved only for internal controller failures.
@@ -251,7 +254,7 @@ func signerFailure(r *ReconcileCSR, csr *csrv1.CertificateSigningRequest, reason | |||
|
|||
// Update the status conditions on the CSR object | |||
func updateCSRStatusConditions(r *ReconcileCSR, csr *csrv1.CertificateSigningRequest, reason string, message string) { | |||
csr.Status.Conditions = append(csr.Status.Conditions, csrv1.CertificateSigningRequestCondition{ | |||
setCertificateSigningRequestCondition(&csr.Status.Conditions, csrv1.CertificateSigningRequestCondition{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use this?
func SetStatusCondition(conditions *[]metav1.Condition, newCondition metav1.Condition) (changed bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is updating k8s.io/api/certificates/v1.CertificateSigningRequestCondition
object here, so not using generic Condition object.
@pperiyasamy what was the CSR that caused the issue? We should not retry denied CSRs |
existingCondition.Status = newCondition.Status | ||
existingCondition.LastTransitionTime = metav1.NewTime(time.Now()) | ||
} | ||
existingCondition.Reason = newCondition.Reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't you updating the LastTransitionTime
if the Status
doesn't change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, updated it.
When CSR signing reattempt happens, signer controller is not updating existing CertificateFailed condition type, instead it tries to add another CertificateFailed condition and leads to Duplicate value: "Failed" error in the network co status, so fixing it just by updating existing CertificateFailed condition. Signed-off-by: Periyasamy Palanisamy <[email protected]>
749d72a
to
a33bca4
Compare
@pperiyasamy: This pull request references Jira Issue OCPBUGS-54238, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kyrtapz, martinkennelly, pperiyasamy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
1 similar comment
/retest-required |
/retest-required |
1 similar comment
/retest-required |
/retest-required |
/retest-required |
/override ci/prow/e2e-aws-ovn-windows perma-failing due to https://issues.redhat.com/browse/WINC-1384 |
/override ci/prow/e2e-aws-ovn-ipsec-upgrade no related to this change |
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-aws-ovn-windows In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-aws-ovn-ipsec-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/e2e-aws-ovn-windows |
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-aws-ovn-ipsec-upgrade, ci/prow/e2e-aws-ovn-windows In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/e2e-aws-ovn-windows |
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-aws-ovn-windows In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/e2e-aws-ovn-windows This has passed before and last iteration failed during deprovision due to overall job timeout
|
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-aws-ovn-windows In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/override ci/prow/e2e-aws-ovn-upgrade last override was meant to be this one |
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-aws-ovn-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@pperiyasamy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/override ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec Has passed before, unrelated, failing on other PRs, tracked in https://issues.redhat.com/browse/OCPBUGS-55280 |
@jcaamano: Overrode contexts on behalf of jcaamano: ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
e6d5a38
into
openshift:master
@pperiyasamy: Jira Issue OCPBUGS-54238: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-54238 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: cluster-network-operator |
When CSR signing reattempt happens, signer controller is not updating existing
CertificateFailed
condition type, instead it tries to add anotherCertificateFailed
condition and leads to Duplicate value: "Failed" error in the network co status,Fixing it just by updating existing
CertificateFailed
condition so that network status don't get updated unnecessarily for this case.