Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct manual retry logic. Fixes #14124 #14328

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jswxstw
Copy link
Member

@jswxstw jswxstw commented Mar 25, 2025

Fixes #14124

Motivation

In certain scenarios, manual retries do not work properly.

Modifications

  • Retry all failed execution nodes
  • Reset all group nodes and non-boundary parent nodes if needed
  • Do not retry nodes which their descendant nodes are Succeeded

Verification

Case 1:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: http-template-
spec:
  entrypoint: main
  arguments:
    parameters:
      # good: https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json
      # bad: https://raw.githubusercontent.com/argoproj/argo-workflows/thisisnotahash/pkg/apis/workflow/v1alpha1/generated.swagger.json
      - name: url
        value: "https://raw.githubusercontent.com/argoproj/argo-workflows/thisisnotahash/pkg/apis/workflow/v1alpha1/generated.swagger.json"
  templates:
    - name: main
      steps:
        - - name: fail1
            template: http
          - name: fail2
            template: http
    - name: http
      http:
        url: "{{workflow.parameters.url}}"

Case 2:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: workflow-exit-handler-fail
spec:
  entrypoint: echo
  onExit: exit-handler
  templates:
  - name: echo
    http:
      url: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"
  - name: fail
    container:
      image: alpine:3.18
      command: [sh, -c]
      args: ["exit 1"]
  - name: exit-handler
    steps:
      - - name: exit-handler-task
          template: fail

Case 3:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: workflow-steps-with-retry-fail
spec:
  entrypoint: main
  templates:
  - name: main
    steps:
    - - name: retry-step-group-case
        template: fail-step-group
  - name: fail-with-rate
    container:
      image: python:alpine3.6
      command: ["python", -c]
      args: ["import random; import sys; exit_code = random.choice([0, 1]); sys.exit(exit_code);"]
  - name: fail-step-group
    steps:
    - - name: step1
        template: fail-with-rate
    - - name: step2
        template: fail-with-rate
    - - name: step3
        template: fail-with-rate
    retryStrategy:
      limit: "1"

Case 4

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: dag-contiue-on-fail
spec:
  retryStrategy:
    limit: 1
  entrypoint: workflow
  templates:
  - name: workflow
    dag:
      tasks:
      - name: A
        template: hello-world
      - name: B
        depends: "A"
        template: intentional-fail
      - name: C
        depends: "A"
        template: hello-world
      - name: D
        depends: "B.Failed && C"
        template: hello-world
      - name: E
        depends: "A"
        template: intentional-fail
      - name: F
        depends: "A"
        template: hello-world
      - name: G
        depends: "E && F"
        template: hello-world

  - name: hello-world
    container:
      image: busybox
      command: [echo]
      args: ["hello world"]

  - name: intentional-fail
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["echo intentional failure; exit 1"]

Documentation

@jswxstw jswxstw marked this pull request as draft March 25, 2025 04:01
@jswxstw jswxstw changed the title fix: manual retry fix: correct manual retry logic. Fixes #14124 Apr 2, 2025
@jswxstw
Copy link
Member Author

jswxstw commented Apr 3, 2025

/retest

@jswxstw jswxstw marked this pull request as ready for review April 3, 2025 07:22
@jswxstw jswxstw requested a review from isubasinghe April 3, 2025 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP and Plugin nodes cannot be manually retried
1 participant