-
Notifications
You must be signed in to change notification settings - Fork 138
Description
Hello,
Trying to get monitoring setup for argocd in AWS EKS. First, we are using the prometheus operator. Second, we are deploying argocd with autocd-pilot. But what I am doing is using helm chart template to grab the definition of service monitors and then temp adding them manually. Eventually, they will be added to argocd-auto pilot customize. Incidentally, helm template doesn't work out to the box because of (.Capabilities.APIVersions.Has "monitoring.coreos.com/v1") in the enable conditional. You have to download the chart, perform a helm dependency build, them remove that from the enable conditional.
Configuration:
server:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: "60s"
selector:
release: kube-prometheus-stack
namespace: "monitoring"
will produce a ServiceMonitor for argocd-server. It is discovered by prometheus but no targets are registering.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-server
namespace: "monitoring"
labels:
helm.sh/chart: argo-cd-7.8.13
app.kubernetes.io/name: argocd-server
app.kubernetes.io/instance: argocd
app.kubernetes.io/component: server
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: argocd
app.kubernetes.io/version: "v2.14.7"
release: kube-prometheus-stack
spec:
endpoints:
- port: http-metrics
interval: 30s
scrapeTimeout: 60s
path: /metrics
honorLabels: false
namespaceSelector:
matchNames:
- argocd
selector:
matchLabels:
app.kubernetes.io/name: argocd-server-metrics
app.kubernetes.io/instance: argocd
app.kubernetes.io/component: server
also tried the alternative configuration:
global:
addPrometheusAnnotations: false```
server:
metrics:
enabled: true
I have set this up manually with minikube using the follow helm chart config and it works . argocd-autopilot fails to produce the service monitors for the three service not provided from the remote repo:
global:
addPrometheusAnnotations: false
server:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: "10s"
selector:
release: kube-prometheus-stack
namespace: "monitoring"
controller:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: "10s"
selector:
release: kube-prometheus-stack
namespace: "monitoring"
applicationSet:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: "10s"
selector:
release: kube-prometheus-stack
namespace: "monitoring"
repoServer:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: "10s"
selector:
release: kube-prometheus-stack
namespace: "monitoring"
notifications:
metrics:
enabled: true
serviceMonitor:
enabled: true
interval: 30s
scrapeTimeout: "10s"
selector:
release: kube-prometheus-stack
namespace: "monitoring"
Kostis
Some clarifications
What version of the prometheus operator are you using?
What do the logs of the operator show?
How did you create the EKS cluster
If it works on Minikube and not on EKS, it might have something to do with the EKS configuration (unrelated to Argo CD). I would start by understanding what is different between the two clusters.
David
Helm chart works on minikube. argocd-autopilot fails in minikube and AWS EKS.
After a significant amount of to time working on this, I can attest that this is an issue with autopilot. The helm chart works with no additional effort on minikube (although you have to remove service monitor check if using helm template). With the chart, and service monitors enabled, it also generates a service to use to connect to. Those services define the port as http-metrics and naming conventions of these additional resources is inconsistent.
This differs from what I see in autopilot. First, autopilot includes three service monitor services.
- argocd-metrics --> what is called argocd-application-controller-metrics in the chart
- argocd-server-metrics
- argocd-notifications-controller-metrics
These three define their port as metrics. Creating monitors with the correct port and simplifying the matchLabels works. However, I have been unable to replicate that configuration for the remaining monitors - repo-server-metrics, appset-controller-metrics, and commit-server-metrics. I just have not been able to get them to show targets.
The second “unexpected” issue is that autopilot only deploys resources to the argocd namespace (issue #660). Our plan was to add service monitors as part of bootstrapping but that won’t work because they go in the monitoring namespace.
Lastly, a third issue we ran into is that with an application as a resource in the bootstrap, argocd fails to start because the CRD controller is not ready to accept requests (issue #661). A subsequent bootstrap will work. So argocd CRDs need to be in place before bootstrapping. I am mentioning this here’re because this is where the conversation is.
To answer your questions above:
- kube-prometheus-stack chart version is 69.4.1, app version is v0.80.0 (we will be upgrading soon)
- prometheus-operator-crds chart version is 18.0.1, app version is v0.81.0
- We build our cluster with Terraform but argocd is not yet connected.
- I have tested the chart on minikube and all service monitors work although the uppear to be some gauges at the top of the Grafana dashboard that do not load data.
- I have tested autopilot on minikube and managed to get three service monitors to connect - the ones that are associated to the autopilot provided services. The others don’t show targets
- I am moving this now to AWS autopilot but for the most part the autopilot behavior is the same as minikube.
- In all cases, the logs of the operator and prometheus show no errors.
Ok I now have this deployed to AWS with autopilot. With the three default serivces - those service monitors work - argocd-metrics, argocd-notifications-controller-metrics, and argocd-server metrics. I have now added argocd-applicationset-controller-metrics and its servicemonitor.yaml
apiVersion: v1
kind: Service
metadata:
name: argocd-applicationset-controller-metrics
namespace: argocd
labels:
helm.sh/chart: argo-cd-7.8.14
app.kubernetes.io/name: argocd-applicationset-controller-metrics
app.kubernetes.io/instance: argocd
app.kubernetes.io/component: applicationset-controller
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: argocd
app.kubernetes.io/version: "v2.14.8"
spec:
type: ClusterIP
ports:
- name: **http-metrics**
protocol: TCP
port: 8080
targetPort: metrics
selector:
app.kubernetes.io/name: argocd-applicationset-controller-metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-applicationset-controller
namespace: "monitoring"
labels:
helm.sh/chart: argo-cd-7.8.13
app.kubernetes.io/name: argocd-applicationset-controller
app.kubernetes.io/instance: argocd
app.kubernetes.io/component: applicationset-controller
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: argocd
app.kubernetes.io/version: "v2.14.7"
release: kube-prometheus-stack
spec:
endpoints:
- port: **http-metrics**
interval: 30s
scrapeTimeout: 10s
path: /metrics
honorLabels: false
namespaceSelector:
matchNames:
- argocd
selector:
matchLabels:
app.kubernetes.io/name: argocd-applicationset-controller-metrics
When I execute:
kubectl get svc -n argocd -l app.kubernetes.io/name=argocd-applicationset-controller-metrics
I get back argocd-applicationset-controller-metrics service:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
argocd-applicationset-controller-metrics ClusterIP 172.20.9.159 <none> 8080/TCP 27m
so the connection from the servicemonitor to the service is correct. Now using https://github.com/nicolaka/netshoot and https://httpie.io/:
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot
http argocd-applicationset-controller-metrics.argocd.svc.cluster.local:8080/metrics
http: error: ConnectionError: HTTPConnectionPool(host='argocd-applicationset-controller-metrics.argocd.svc.cluster.local', port=8080): Max retries exceeded with url: /metrics (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7faafbb4a030>: Failed to establish a new connection: [Errno 111] Connection refused')) while doing a GET request to URL: http://argocd-applicationset-controller-metrics.argocd.svc.cluster.local:8080/metrics
The applicationset deployed defines ports as:
ports:
- containerPort: 7000
name: webhook
- containerPort: 8080
name: metrics
I tried explicitely adding the metrics port to the args of the container as is done and get the same error. Same app version in kustomize and helm are being used. For what I can gather I believe this to be a code issue but I cannot eplan why it works in helm but not autopilot.