Skip to content

Commit af33c53

Browse files
committed
pkv/cvo/status: Raise Operator leveling grace-period to 40 minutes
Similar to openshift#422, further tune things up so that we can ensure that our 90th percentile of clusters do not trip over momentary cluster upgrade failures whenever operators take longer than 20 minutes to roll out.
1 parent 67cdbb5 commit af33c53

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/user/status.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ If this happens it is a CVO coding error, because clearing [`desiredUpdate`][api
2222
`ClusterOperatorNotAvailable` (or the consolidated `ClusterOperatorsNotAvailable`) is set when the CVO fails to retrieve the ClusterOperator from the cluster or when the retrieved ClusterOperator does not satisfy [the reconciliation conditions](reconciliation.md#clusteroperator).
2323

2424
Unlike most manifest-reconciliation failures, this error does not immediately result in `Failing=True`.
25-
Under some conditions during installs and updates, the CVO will treat this condition as a `Progressing=True` condition and give the operator up to twenty minutes to level before reporting `Failing=True`.
25+
Under some conditions during installs and updates, the CVO will treat this condition as a `Progressing=True` condition and give the operator up to fourty minutes to level before reporting `Failing=True`.
2626

2727
## RetrievedUpdates
2828

pkg/cvo/status.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -344,13 +344,13 @@ func (optr *Operator) syncStatus(ctx context.Context, original, config *configv1
344344

345345
// convertErrorToProgressing returns true if the provided status indicates a failure condition can be interpreted as
346346
// still making internal progress. The general error we try to suppress is an operator or operators still being
347-
// unavailable AND the general payload task making progress towards its goal. An operator is given 20 minutes since
347+
// unavailable AND the general payload task making progress towards its goal. An operator is given 40 minutes since
348348
// its last update to go ready, or an hour has elapsed since the update began, before the condition is ignored.
349349
func convertErrorToProgressing(history []configv1.UpdateHistory, now time.Time, status *SyncWorkerStatus) (reason string, message string, ok bool) {
350350
if len(history) == 0 || status.Failure == nil || status.Reconciling || status.LastProgress.IsZero() {
351351
return "", "", false
352352
}
353-
if now.Sub(status.LastProgress) > 20*time.Minute || now.Sub(history[0].StartedTime.Time) > time.Hour {
353+
if now.Sub(status.LastProgress) > 40*time.Minute || now.Sub(history[0].StartedTime.Time) > time.Hour {
354354
return "", "", false
355355
}
356356
uErr, ok := status.Failure.(*payload.UpdateError)

0 commit comments

Comments
 (0)