Skip to content

fix: prevent dfs sorter infinite recursion on cycle. Fixes #13395 #14391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

Adrien-D
Copy link
Member

Fixes #13395

Motivation

We're encountering an infinite loop because DFS sorter doesn't account for cycles in the graph. If a node is revisited while it’s still being processed, we can enter a cycle

Modifications

To fix this, I track nodes that are currently being visited (i.e., "visiting") in addition to those that are already visited ("discovered") and skip the nodes we are currently visiting.

Verification

Tested on a workflow that reproduces the error, and multiple others to make sure nothing is broken.

Here is the workflow I used
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: stack-overflow-test-
spec:
  podGC:
    strategy: OnWorkflowCompletion
  entrypoint: main
  templates:
    # Main function
    - name: main
      steps:
      - - name: step-a
          template: step-a
          arguments:
            artifacts:
              - name: test
                archive:
                  none: {}
                s3:
                  bucket: my-bucket
                  key: foo/message.txt
                  endpoint: minio.argo.svc.cluster.local:9000
                  insecure: true
                  accessKeySecret:
                    name: my-minio-cred
                    key: accesskey
                  secretKeySecret:
                    name: my-minio-cred
                    key: secretkey
      
      - - name: step-b
          template: step-b
          arguments:
            parameters:
            - name: test
              value: "{{=sprig.fromJson(steps['step-a'].outputs.parameters['test'])['this']}}"
          

      - - name: step-c
          when: "{{=steps['step-b'].outputs.parameters['test'] != ''}}"
          template: step-c
          arguments:
            parameters:
            - name: test
              value: "{{=sprig.fromJson(steps['step-b'].outputs.parameters['test'])['that']}}"
          

      - - name: step-d
          when: "{{=steps['step-b'].outputs.parameters['test'] != ''}}"
          arguments:
            artifacts:
            - name: test
              from: '{{steps.step-b.outputs.artifacts.test}}' 
          inline: 
            inputs:
              artifacts:
              - name: test
                path: /tmp/test
                archive:
                  none: {}
            outputs:
              artifacts:
              - name: test
                path: /tmp/test
                archive:
                  none: {}
                s3:
                  bucket: my-bucket
                  key: foo/message.txt
                  endpoint: minio.argo.svc.cluster.local:9000
                  insecure: true
                  accessKeySecret:
                    name: my-minio-cred
                    key: accesskey
                  secretKeySecret:
                    name: my-minio-cred
                    key: secretkey
            container:
              image: docker.io/argoproj/argosay:v2
              command: [/argosay, echo, 'This is a test']

    - name: step-a
      inputs:
        artifacts:
        - name: test
          path: /tmp/test
          archive:
            none: {}
      outputs:
        artifacts:
        - name: test
          path: /tmp/test
          archive:
            none: {}
        parameters:
        - name: test
          valueFrom:
            path: /tmp/test
      container:
        image: docker.io/argoproj/argosay:v2
        command: [/argosay, echo, 'This is a test']
    - name: step-b
      inputs:
        parameters:
        - name: test
      outputs:
        artifacts:
        - name: test
          path: /tmp/test
          archive:
            none: {}
        parameters:
        - name: test
          valueFrom:
            path: /tmp/test
      script:
        image: docker.io/library/alpine:3
        command: [/bin/sh, -xec]
        env:
        - name: TEST
          value: '{{inputs.parameters.test}}'
        args:
        - |
          echo '{"that": "'"${TEST}"'"}' > /tmp/test
    - name: step-c
      inputs:
        parameters:
        - name: test
      
      container:
        image: docker.io/argoproj/argosay:v2
        command: [/argosay, echo, 'This is a test']
Normal render Fast render
Screenshot 2025-04-15 at 17 31 07 Screenshot 2025-04-15 at 17 31 15

Documentation

@Adrien-D Adrien-D force-pushed the fix-dfs-recursion branch from 60eb576 to 97eaa3a Compare April 15, 2025 15:59
@@ -19,7 +20,14 @@ export class DfsSorter {
if (this.discovered.has(n)) {
return;
}
if (this.visiting.has(n)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Adrien-D - your Javascript/Typescript knowledge is more current than mine. Do you know how a Set's has function does object comparison? I saw some stuff that said it was a reference comparison and others saying it's a deep equals comparison. In both cases, would a node ID be better?

For the reference comparison, if an equivalent node gets compared but has a different reference due to react deep copying it on the next render, wouldn't has return a false negative?

Similarly for the deep equality, if the node gets a field added or modified, wouldn't has also produce a false negative as well?

I'll default to your expertise, but flagging in case these scenarios are valid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPZ13 Great catch, you're absolutely right. Set.has() is reference-based, example here. So that could definitely lead to a false comparison, but not in this case here because Node is actually the node ID (naming is confusing I agree). So it's a string that shouldn't lead to this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's in a name? 😁

Thanks, @Adrien-D - approving now

@@ -19,7 +20,14 @@ export class DfsSorter {
if (this.discovered.has(n)) {
return;
}
if (this.visiting.has(n)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's in a name? 😁

Thanks, @Adrien-D - approving now

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@terrytangyuan terrytangyuan merged commit 364bde2 into argoproj:main Apr 17, 2025
17 checks passed
@Joibel
Copy link
Member

Joibel commented Apr 23, 2025

/cherry-pick release-3.6

gcp-cherry-pick-bot bot pushed a commit that referenced this pull request Apr 23, 2025
Joibel pushed a commit that referenced this pull request Apr 23, 2025
…erry-pick #14391) (#14420)

Signed-off-by: Adrien Delannoy <[email protected]>
Co-authored-by: Adrien Delannoy <[email protected]>
kim-codefresh added a commit to codefresh-io/argo-workflows that referenced this pull request May 20, 2025
…abilities fixes (Cr 28355) (#358)

* fix: bump deps for k8schain to fix ecr-login (argoproj#14008) (release-3.6 cherry-pick) (argoproj#14174)

* fix(ci): python sdk release process (release-3.6) (argoproj#14183)

Signed-off-by: Alan Clucas <[email protected]>

* docs: clarify qps/burst on controller (cherry-pick argoproj#14190) (argoproj#14192)

Signed-off-by: Tim Collins <[email protected]>
Co-authored-by: Tim Collins <[email protected]>

* fix(api/jsonschema): use unchanging JSON Schema version (cherry-pick argoproj#14092) (argoproj#14256)

Signed-off-by: Roger Peppe <[email protected]>
Co-authored-by: Roger Peppe <[email protected]>

* fix(api/jsonschema): use working `$id` (cherry-pick argoproj#14257) (argoproj#14258)

Signed-off-by: Roger Peppe <[email protected]>
Co-authored-by: Roger Peppe <[email protected]>

* docs: autogenerate tested k8s versions and centralize config (argoproj#14176) (release-3.6) (argoproj#14262)

Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Co-authored-by: Mason Malone <[email protected]>

* chore(deps): bump minio-go to newer version (argoproj#14185) (release-3.6) (argoproj#14261)

Co-authored-by: Vaibhav Kaushik <[email protected]>

* fix: split pod controller from workflow controller (argoproj#14129) (release-3.6) (argoproj#14263)

* chore(deps): fix snyk (argoproj#14264) (release-3.6) (argoproj#14268)

* chore: revert to correct k8s versions

Accidental bump from argoproj#14176 cherry-pick

Signed-off-by: Alan Clucas <[email protected]>

* chore(deps): bump github.com/go-jose/go-jose/v3 from 3.0.3 to 3.0.4 in the go_modules group (cherry-pick argoproj#14231) (argoproj#14269)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: wait for workflow informer to sync before pod informer (cherry-pick argoproj#14248) (argoproj#14266)

Signed-off-by: Rohan K <[email protected]>
Co-authored-by: Rohan K <[email protected]>

* fix(cli): remove red from log colour selection. Fixes argoproj#6740 (cherry-pick argoproj#14215) (argoproj#14278)

Signed-off-by: Prabakaran Kumaresshan <[email protected]>
Co-authored-by: Prabakaran Kumaresshan <[email protected]>

* fix: correct semaphore configmap keys for multiple semaphores (argoproj#14184) (release-3.6) (argoproj#14281)

* fix: don't print help for non-validation errors. Fixes argoproj#14234 (cherry-pick argoproj#14249) (argoproj#14283)

Signed-off-by: Koichi Shimada <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Co-authored-by: koichi <[email protected]>
Co-authored-by: Mason Malone <[email protected]>

* docs: fix kubernetes versions (release-3.6) (argoproj#14273)

Signed-off-by: Alan Clucas <[email protected]>

* fix(workflow/sync): use RWMutex to prevent concurrent map access (cherry-pick argoproj#14321) (argoproj#14322)

Signed-off-by: Ryan Currah <[email protected]>
Co-authored-by: Ryan Currah <[email protected]>

* chore(lint): update golangci-lint to 2.1.1 (argoproj#14390) (cherry-pick release-3.6) (argoproj#14417)

* chore: bump golang 1.23->1.24 (argoproj#14385) (cherry-pick release-3.6) (argoproj#14418)

* fix: gracefully handle invalid CronWorkflows and simplify logic.  (cherry-pick argoproj#14197) (argoproj#14419)

Signed-off-by: Mason Malone <[email protected]>

* fix: prevent dfs sorter infinite recursion on cycle. Fixes argoproj#13395 (cherry-pick argoproj#14391) (argoproj#14420)

Signed-off-by: Adrien Delannoy <[email protected]>
Co-authored-by: Adrien Delannoy <[email protected]>

* chore(deps): bump github.com/expr-lang/expr from 1.16.9 to 1.17.0 (argoproj#14307) (release-3.6) (argoproj#14421)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps)!: update k8s and argo-events (release-3.6) (argoproj#14424)

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: william.vanhevelingen <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: William Van Hevelingen <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: William Van Hevelingen <[email protected]>
Co-authored-by: Mason Malone <[email protected]>

* fix: correct retry logic (argoproj#13734) (release-3.6) (argoproj#14428)

Signed-off-by: isubasinghe <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Co-authored-by: Isitha Subasinghe <[email protected]>

* fix: manual retries exit handler cleanup. Fixes argoproj#14180 (argoproj#14181) (release-3.6) (argoproj#14429)

Signed-off-by: isubasinghe <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Co-authored-by: Isitha Subasinghe <[email protected]>

* fix: correct manual retry logic. Fixes argoproj#14124 (argoproj#14328) (release-3.6) (argoproj#14430)

Signed-off-by: oninowang <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Co-authored-by: jswxstw <[email protected]>

* fix: disable ALPN in argo-server as a workaround (argoproj#14433)

Signed-off-by: Alan Clucas <[email protected]>

* result of codegen

Signed-off-by: Kim <[email protected]>

* fix:lint

Signed-off-by: Kim <[email protected]>

---------

Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Tim Collins <[email protected]>
Signed-off-by: Roger Peppe <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Rohan K <[email protected]>
Signed-off-by: Prabakaran Kumaresshan <[email protected]>
Signed-off-by: Koichi Shimada <[email protected]>
Signed-off-by: Ryan Currah <[email protected]>
Signed-off-by: Adrien Delannoy <[email protected]>
Signed-off-by: william.vanhevelingen <[email protected]>
Signed-off-by: William Van Hevelingen <[email protected]>
Signed-off-by: isubasinghe <[email protected]>
Signed-off-by: oninowang <[email protected]>
Signed-off-by: Kim <[email protected]>
Co-authored-by: Alan Clucas <[email protected]>
Co-authored-by: gcp-cherry-pick-bot[bot] <98988430+gcp-cherry-pick-bot[bot]@users.noreply.github.com>
Co-authored-by: Tim Collins <[email protected]>
Co-authored-by: Roger Peppe <[email protected]>
Co-authored-by: Mason Malone <[email protected]>
Co-authored-by: Vaibhav Kaushik <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Rohan K <[email protected]>
Co-authored-by: Prabakaran Kumaresshan <[email protected]>
Co-authored-by: koichi <[email protected]>
Co-authored-by: Ryan Currah <[email protected]>
Co-authored-by: Adrien Delannoy <[email protected]>
Co-authored-by: William Van Hevelingen <[email protected]>
Co-authored-by: Isitha Subasinghe <[email protected]>
Co-authored-by: jswxstw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Artifact "Cycle" Crashes UI
4 participants