Skip to content

Retry on transient error when cron creating workflow #13970

Closed
@tczhao

Description

@tczhao

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

Workflow controller checks transient error and retry on create pod, and many other processes that interact with k8s api,

This transient retry do not apply to Create workflow.

We have seen intermittent errors like connection timed out, connection reset by peer occasionally when k8s API is underpressure.
This causes cron to skipthe schedule.

We should add a similar retry on transient error logic to create a workflow

Version(s)

217b598

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

-

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions