Skip to content

Commit 08d5c42

Browse files
committed
pkg/cvo/status: Raise Operator leveling grace-period to 20 minutes
Reduce false-positives when operators take a while to level (like the machine-config operator, which has to roll the control plane machines). We may want to raise this further in the future, but baby steps ;). The previous 10-minute value is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, #158), which doesn't make a case for that specific value. So the bump is unlikely to break anything unexpected.
1 parent 5d06bfc commit 08d5c42

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/user/status.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ If this happens it is a CVO coding error, because clearing [`desiredUpdate`][api
2222
`ClusterOperatorNotAvailable` (or the consolidated `ClusterOperatorsNotAvailable`) is set when the CVO fails to retrieve the ClusterOperator from the cluster or when the retrieved ClusterOperator does not satisfy [the reconciliation conditions](reconciliation.md#clusteroperator).
2323

2424
Unlike most manifest-reconciliation failures, this error does not immediately result in `Failing=True`.
25-
Under some conditions during installs and updates, the CVO will treat this condition as a `Progressing=True` condition and give the operator up to ten minutes to level before reporting `Failing=True`.
25+
Under some conditions during installs and updates, the CVO will treat this condition as a `Progressing=True` condition and give the operator up to twenty minutes to level before reporting `Failing=True`.
2626

2727
## RetrievedUpdates
2828

pkg/cvo/status.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -332,13 +332,13 @@ func (optr *Operator) syncStatus(ctx context.Context, original, config *configv1
332332

333333
// convertErrorToProgressing returns true if the provided status indicates a failure condition can be interpreted as
334334
// still making internal progress. The general error we try to suppress is an operator or operators still being
335-
// unavailable AND the general payload task making progress towards its goal. An operator is given 10 minutes since
335+
// unavailable AND the general payload task making progress towards its goal. An operator is given 20 minutes since
336336
// its last update to go ready, or an hour has elapsed since the update began, before the condition is ignored.
337337
func convertErrorToProgressing(history []configv1.UpdateHistory, now time.Time, status *SyncWorkerStatus) (reason string, message string, ok bool) {
338338
if len(history) == 0 || status.Failure == nil || status.Reconciling || status.LastProgress.IsZero() {
339339
return "", "", false
340340
}
341-
if now.Sub(status.LastProgress) > 10*time.Minute || now.Sub(history[0].StartedTime.Time) > time.Hour {
341+
if now.Sub(status.LastProgress) > 20*time.Minute || now.Sub(history[0].StartedTime.Time) > time.Hour {
342342
return "", "", false
343343
}
344344
uErr, ok := status.Failure.(*payload.UpdateError)

0 commit comments

Comments
 (0)