Skip to content

Connot collect metrics from appset, commit-server, or repo-server controllers #659

@darnone

Description

@darnone

Hello,
Trying to get monitoring setup for argocd in AWS EKS. First, we are using the prometheus operator. Second, we are deploying argocd with autocd-pilot. But what I am doing is using helm chart template to grab the definition of service monitors and then temp adding them manually. Eventually, they will be added to argocd-auto pilot customize. Incidentally, helm template doesn't work out to the box because of (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") in the enable conditional. You have to download the chart, perform a helm dependency build, them remove that from the enable conditional.
Configuration:

server:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: "60s"
      selector: 
        release: kube-prometheus-stack
      namespace: "monitoring"

will produce a ServiceMonitor for argocd-server. It is discovered by prometheus but no targets are registering.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-server
  namespace: "monitoring"
  labels:
    helm.sh/chart: argo-cd-7.8.13
    app.kubernetes.io/name: argocd-server
    app.kubernetes.io/instance: argocd
    app.kubernetes.io/component: server
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: argocd
    app.kubernetes.io/version: "v2.14.7"
    release: kube-prometheus-stack
spec:
  endpoints:
    - port: http-metrics
      interval: 30s
      scrapeTimeout: 60s
      path: /metrics
      honorLabels: false
  namespaceSelector:
    matchNames:
      - argocd
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-server-metrics
      app.kubernetes.io/instance: argocd
      app.kubernetes.io/component: server

also tried the alternative configuration:

global:
  addPrometheusAnnotations: false```
server:
  metrics:
    enabled: true

I have set this up manually with minikube using the follow helm chart config and it works . argocd-autopilot fails to produce the service monitors for the three service not provided from the remote repo:

global:
  addPrometheusAnnotations: false

server:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: "10s"
      selector: 
        release: kube-prometheus-stack
      namespace: "monitoring"

controller:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: "10s"
      selector: 
        release: kube-prometheus-stack
      namespace: "monitoring"

applicationSet:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: "10s"
      selector: 
        release: kube-prometheus-stack
      namespace: "monitoring"

repoServer:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: "10s"
      selector: 
        release: kube-prometheus-stack
      namespace: "monitoring"

notifications:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      scrapeTimeout: "10s"
      selector: 
        release: kube-prometheus-stack
      namespace: "monitoring"

Kostis
Some clarifications
What version of the prometheus operator are you using?
What do the logs of the operator show?
How did you create the EKS cluster

If it works on Minikube and not on EKS, it might have something to do with the EKS configuration (unrelated to Argo CD). I would start by understanding what is different between the two clusters.

David
Helm chart works on minikube. argocd-autopilot fails in minikube and AWS EKS.

After a significant amount of to time working on this, I can attest that this is an issue with autopilot. The helm chart works with no additional effort on minikube (although you have to remove service monitor check if using helm template). With the chart, and service monitors enabled, it also generates a service to use to connect to. Those services define the port as http-metrics and naming conventions of these additional resources is inconsistent.
This differs from what I see in autopilot. First, autopilot includes three service monitor services.

  • argocd-metrics --> what is called argocd-application-controller-metrics in the chart
  • argocd-server-metrics
  • argocd-notifications-controller-metrics

These three define their port as metrics. Creating monitors with the correct port and simplifying the matchLabels works. However, I have been unable to replicate that configuration for the remaining monitors - repo-server-metrics, appset-controller-metrics, and commit-server-metrics. I just have not been able to get them to show targets.

The second “unexpected” issue is that autopilot only deploys resources to the argocd namespace (issue #660). Our plan was to add service monitors as part of bootstrapping but that won’t work because they go in the monitoring namespace.

Lastly, a third issue we ran into is that with an application as a resource in the bootstrap, argocd fails to start because the CRD controller is not ready to accept requests (issue #661). A subsequent bootstrap will work. So argocd CRDs need to be in place before bootstrapping. I am mentioning this here’re because this is where the conversation is.
To answer your questions above:

  • kube-prometheus-stack chart version is 69.4.1, app version is v0.80.0 (we will be upgrading soon)
  • prometheus-operator-crds chart version is 18.0.1, app version is v0.81.0
  • We build our cluster with Terraform but argocd is not yet connected.
  • I have tested the chart on minikube and all service monitors work although the uppear to be some gauges at the top of the Grafana dashboard that do not load data.
  • I have tested autopilot on minikube and managed to get three service monitors to connect - the ones that are associated to the autopilot provided services. The others don’t show targets
  • I am moving this now to AWS autopilot but for the most part the autopilot behavior is the same as minikube.
  • In all cases, the logs of the operator and prometheus show no errors.

Ok I now have this deployed to AWS with autopilot. With the three default serivces - those service monitors work - argocd-metrics, argocd-notifications-controller-metrics, and argocd-server metrics. I have now added argocd-applicationset-controller-metrics and its servicemonitor.yaml

apiVersion: v1
kind: Service
metadata:
  name: argocd-applicationset-controller-metrics
  namespace: argocd
  labels:
    helm.sh/chart: argo-cd-7.8.14
    app.kubernetes.io/name: argocd-applicationset-controller-metrics
    app.kubernetes.io/instance: argocd
    app.kubernetes.io/component: applicationset-controller
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: argocd
    app.kubernetes.io/version: "v2.14.8"
spec:
  type: ClusterIP  
  ports:
  - name:  **http-metrics**
    protocol: TCP
    port: 8080
    targetPort: metrics
  selector:
    app.kubernetes.io/name: argocd-applicationset-controller-metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-applicationset-controller
  namespace: "monitoring"
  labels:
    helm.sh/chart: argo-cd-7.8.13
    app.kubernetes.io/name: argocd-applicationset-controller
    app.kubernetes.io/instance: argocd
    app.kubernetes.io/component: applicationset-controller
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: argocd
    app.kubernetes.io/version: "v2.14.7"
    release: kube-prometheus-stack
spec:
  endpoints:
    - port: **http-metrics**
      interval: 30s
      scrapeTimeout: 10s
      path: /metrics
      honorLabels: false
  namespaceSelector:
    matchNames:
      - argocd
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-applicationset-controller-metrics

When I execute:

kubectl get svc -n argocd -l app.kubernetes.io/name=argocd-applicationset-controller-metrics

I get back argocd-applicationset-controller-metrics service:

NAME                                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
argocd-applicationset-controller-metrics   ClusterIP   172.20.9.159   <none>        8080/TCP   27m

so the connection from the servicemonitor to the service is correct. Now using https://github.com/nicolaka/netshoot and https://httpie.io/:

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot
http argocd-applicationset-controller-metrics.argocd.svc.cluster.local:8080/metrics

http: error: ConnectionError: HTTPConnectionPool(host='argocd-applicationset-controller-metrics.argocd.svc.cluster.local', port=8080): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7faafbb4a030>: Failed to establish a new connection: [Errno 111] Connection refused')) while doing a GET request to URL: http://argocd-applicationset-controller-metrics.argocd.svc.cluster.local:8080/metrics

The applicationset deployed defines ports as:

ports:
  - containerPort: 7000
    name: webhook
  - containerPort: 8080
     name: metrics

I tried explicitely adding the metrics port to the args of the container as is done and get the same error. Same app version in kustomize and helm are being used. For what I can gather I believe this to be a code issue but I cannot eplan why it works in helm but not autopilot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions