Description
What happened?:
Issue 1: namespaced: false config doesn't work
We tried to
- expose external metrics to k8s API as no namespace queries, as the metric itself is from kafka running on another cluster and cannot be directly related to any k8s resources.
- Use it in HPA
We applied the setting using prometheus adapter helm chart using command:
helm upgrade -i prometheus-adapter prometheus-community/prometheus-adapter --version 4.11.0 -n monitoring -f namespaces/monitoring/prometheus-adapter-values.yaml
With config:
rules:
default: true
external:
- metricsQuery: aws_kafka_sum_offset_lag_sum
name:
as: aws_kafka_sum_offset_lag_sum
matches: aws_kafka_sum_offset_lag_sum
resources:
namespaced: false
seriesQuery: '{__name__=~"aws_kafka_sum_offset_lag_sum"}'
And we do can proof that our config applied by running kubectl get cm -n monitoring prometheus-adapter -o yaml
externalRules:
- metricsQuery: aws_kafka_sum_offset_lag_sum
name:
as: aws_kafka_sum_offset_lag_sum
matches: aws_kafka_sum_offset_lag_sum
resources:
namespaced: false
seriesQuery: '{__name__=~"aws_kafka_sum_offset_lag_sum"}'
However when we try to verify that the metrics do appear in k8s API by running: kubectl get --raw /apis/external.metrics.k8s.io/v1beta1 | jq .
We got:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "aws_kafka_sum_offset_lag_sum",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
}
]
}
You can see here the "namespaced"
is true
, which contradicts our config.
Issue 2: label filtering doesn't work
We change the adapter config as this:
rules:
default: true
external:
- metricsQuery: sum(aws_kafka_sum_offset_lag_sum) by (cluster_name, consumer_group, topic)
name:
as: aws_kafka_sum_offset_lag_sum
matches: aws_kafka_sum_offset_lag_sum
resources:
overrides:
cluster_name: { resource: "namespace" }
seriesQuery: '{__name__=~"aws_kafka_sum_offset_lag_sum"}'
And HPA setting:
- external:
metric:
name: aws_kafka_sum_offset_lag_sum
selector:
matchLabels:
cluster_name: FOO
consumer_group: bar
topic: foobar1
target:
type: Value
value: "500"
type: External
However, the label filtering doesn't work as expected, the HPA takes all metrics with unmatched labels.
We can verify this by running
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/dna-alive-matchmaking-dev/aws_kafka_sum_offset_lag_sum?labelSelector=cluster_name%3DFOO%2Cconsumer_group%3Dbar%2Ctopic%3Dfoobar1" | jq .
And we get all metrics including these with unmatched labels.
issue 3: with the same config, sometimes kubectl get --raw /apis/external.metrics.k8s.io/v1beta1
return empty results
Not sure if these 3 issues are related, so put all of them here. Kindly help!
Please provide the prometheus-adapter logs with -v=6 around the time the issue happened:
prometheus-adapter logs
adapter logs
We do notice some timeout in logs all the time, unsure if it is related to the above issues.
E0225 01:36:07.768071 1 writers.go:135] apiserver was unable to write a fallback JSON response: http: Handler timeout
E0225 01:36:07.767982 1 writers.go:135] apiserver was unable to write a fallback JSON response: http2: stream closed
E0225 01:36:07.767987 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0225 01:36:07.767987 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0225 01:36:07.767993 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0225 01:36:07.769167 1 writers.go:135] apiserver was unable to write a fallback JSON response: http2: stream closed
E0225 01:36:07.769194 1 timeout.go:142] post-timeout activity - time-elapsed: 2.777574ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0225 01:36:07.769210 1 timeout.go:142] post-timeout activity - time-elapsed: 3.077884ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0225 01:36:07.769226 1 writers.go:135] apiserver was unable to write a fallback JSON response: http2: stream closed
E0225 01:36:07.769227 1 writers.go:135] apiserver was unable to write a fallback JSON response: http2: stream closed
Anything else we need to know?:
Environment:
- prometheus-adapter image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0
- prometheus-adapter chart: 4.11.0
- prometheus image version: quay.io/prometheus/prometheus:v2.48.1
- prometheus chart version: prometheus-community/kube-prometheus-stack --version 55.7.1
- Kubernetes version (use
kubectl version
): v1.30.9-eks-8cce635 - Cloud provider or hardware configuration: AWS EKS v1.30.9-eks-8cce635
- Other info: we do assign enough CPU/memory resources add can verify it is not the bottleneck.