Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semaphore configmap missing retry on transient error #14335

Open
4 tasks done
tczhao opened this issue Mar 27, 2025 · 0 comments · May be fixed by #14336
Open
4 tasks done

semaphore configmap missing retry on transient error #14335

tczhao opened this issue Mar 27, 2025 · 0 comments · May be fixed by #14336

Comments

@tczhao
Copy link
Member

tczhao commented Mar 27, 2025

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

we are seeing errors like

task 'abc-cron-1742968800(0).run(1).extract(0:0)(0).chart' errored: Get "[https://172.20.2.171:443/api/v1/namespaces/default/configmaps/atlas](https://172.20.2.171/api/v1/namespaces/default/configmaps/atlas)": dial tcp [172.20.2.171:443](http://172.20.2.171:443/): connect: connection refused

Get "[https://172.20.2.171:443/api/v1/namespaces/default/configmaps/atlas](https://172.20.2.171/api/v1/namespaces/default/configmaps/atlas)": dial tcp [172.20.2.171:443](http://172.20.2.171:443/): connect: connection refused

The configmap.atlas is what we set for semaphore configmap

- name: api-request
  synchronization:
    semaphore:
      configMapKeyRef:
        name: atlas
        key: api

Our cluster apiserver connection is not the most reliable. However, we don't see any other apiserver connection related error other than this semaphore configmap
The issue is configmap in sempahore doesn't have transient retry
https://github.com/argoproj/argo-workflows/blob/main/workflow/controller/controller.go#L389

Version(s)

latest

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

N/A

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant