Unable to scrape metrics for kubevirt-hyperconverged-operator #3338

nakkoh · 2025-03-12T05:55:10Z

What happened:
A clear and concise description of what the bug is.

Prometheus unable to scrape metrics for kubevirt-hyperconverged-operator.

The cause of this problem appears to be the lack of authorization.

What you expected to happen:
A clear and concise description of what you expected to happen.

How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.

Deploy KubeVirt HyperConverged Cluster Operator on OKD.

Additional context:
Add any other context about the problem here.

Environment:

KubeVirt version (use virtctl version): v1.4.0
Kubernetes version (use kubectl version): v1.31.6
VM or VMI specifications: N/A
Cloud provider or hardware configuration:
- controle plane nodes: OpenStack
- compute nodes: Baremetal
OS (e.g. from /etc/os-release): CentOS Stream CoreOS 418.9.202503040632-0
Kernel (e.g. uname -a): 5.14.0-570.el9.x86_64
Install tools: N/A
Others:
- OKD 4.18.0-okd-scos.3

The text was updated successfully, but these errors were encountered:

orenc1 · 2025-03-12T08:08:23Z

Hi @machadovilaca , could you please check?
I suspect it's related to #3303

machadovilaca · 2025-03-12T09:58:03Z

Hello @nakkoh,

can you share the hco-operator pod logs and the config of the kubevirt-hyperconverged-operator-metrics ServiceMonitor?

nakkoh · 2025-03-13T01:38:42Z

Thank you @machadovilaca

Please refer to the following attachment regarding hco-operator logs.
hco_operator.log

And the kubevirt-hyperconverged-operator-metrics ServiceMonitor is defined as follows.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  creationTimestamp: "2025-02-28T06:31:10Z"
  generation: 16
  labels:
    app: kubevirt-hyperconverged
    app.kubernetes.io/component: monitoring
    app.kubernetes.io/managed-by: hco-operator
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 1.14.0
  name: kubevirt-hyperconverged-operator-metrics
  namespace: kubevirt-hyperconverged
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: false
    controller: false
    kind: Deployment
    name: hco-operator
    uid: 19e2af5c-076d-4bbe-9450-f0d978e830be
  resourceVersion: "19149989"
  uid: b6df0cc7-c4e4-4c30-8039-8b22e0a89c12
spec:
  endpoints:
  - authorization:
      credentials:
        key: token
        name: hco-bearer-auth
    port: http-metrics
  namespaceSelector: {}
  selector:
    matchLabels:
      app: kubevirt-hyperconverged                                                                        
      app.kubernetes.io/component: monitoring
      app.kubernetes.io/managed-by: hco-operator
      app.kubernetes.io/part-of: hyperconverged-cluster
      app.kubernetes.io/version: 1.14.0

nakkoh · 2025-03-13T06:19:58Z

I noticed that the metrics can be scraped.

I don't know if this is relevant, but I made the following configuration changes to prometheus.
https://docs.okd.io/4.18/observability/monitoring/configuring-core-platform-monitoring/storing-and-recording-data.html#configuring-a-persistent-volume-claim_storing-and-recording-data

machadovilaca · 2025-03-13T10:17:43Z

The ServiceMonitor looks correct, and the logs are not the original, so we might now see the problem.
I was looking for something that might indicate that the ServiceMonitor was not correctly reconciled and that we failed to create the tokens.

In these logs we don't see anything working incorrectly and we are also able to see:
{"level":"info","ts":"2025-03-12T15:32:57Z","msg":"Starting EventSource","controller":"hyperconverged-controller","source":"kind source: *v1.ServiceMonitor"}

machadovilaca · 2025-03-13T10:19:50Z

Maybe this is somehow related to the timing between the resource update and Prometheus syncing its config with it. Probably unlikely, but worth checking, I think.

nakkoh · 2025-03-14T06:24:15Z

I have not made any changes, but I noticed that the metrics can not be collected again.

I tried to get metrics in the prometheus pod and it worked.

$ oc -n kubevirt-hyperconverged get secret hco-bearer-auth -o jsonpath='{.data.tokene64 -ds 
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.mQMUDvdzAZcBLtY-aAQ0Am5_Qxe3GNISjGxnoqe7aI4
$ oc -n openshift-monitoring exec -it prometheus-k8s-0 -- bash
bash-5.1$ export TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.mQMUDvdzAZcBLtY-aAQ0Am5_Qxe3GNISjGxnoqe7aI4
bash-5.1$ curl -H "Authorization: Bearer ${TOKEN}" http://10.130.2.28:8383/metrics
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0

... snip ...

Next, I checked the definition of scrape in the prometheus pod and confirmed that the token values are different from those defined in hco-bearer-auth secret.

$ oc -n openshift-monitoring exec prometheus-k8s-0 -- cat/etc/prometheus/config_out/prometheus.env.yaml
global:
  evaluation_interval: 30s
  scrape_interval: 30s
  external_labels:
    prometheus: openshift-monitoring/k8s
    prometheus_replica: prometheus-k8s-0
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
scrape_configs:
- job_name: serviceMonitor/kubevirt-hyperconverged/kubevirt-hyperconverged-operator-metrics/0
  honor_labels: false
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - kubevirt-hyperconverged
  authorization:
    type: Bearer
    credentials: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.e30.twlCgfRdTBHjHHVVLZSffx6ZMYGc-rHxQGLkIrGhUg4
  relabel_configs:

... snip ...

So it seems that servicemonitor is not reflected in the prometheus definition.
Is it a prometheus-operator issue?

nakkoh added the kind/bug label Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to scrape metrics for kubevirt-hyperconverged-operator #3338

Unable to scrape metrics for kubevirt-hyperconverged-operator #3338

nakkoh commented Mar 12, 2025 •

edited

Loading

orenc1 commented Mar 12, 2025

machadovilaca commented Mar 12, 2025

nakkoh commented Mar 13, 2025

nakkoh commented Mar 13, 2025

machadovilaca commented Mar 13, 2025

machadovilaca commented Mar 13, 2025

nakkoh commented Mar 14, 2025

Unable to scrape metrics for kubevirt-hyperconverged-operator #3338

Unable to scrape metrics for kubevirt-hyperconverged-operator #3338

Comments

nakkoh commented Mar 12, 2025 • edited Loading

orenc1 commented Mar 12, 2025

machadovilaca commented Mar 12, 2025

nakkoh commented Mar 13, 2025

nakkoh commented Mar 13, 2025

machadovilaca commented Mar 13, 2025

machadovilaca commented Mar 13, 2025

nakkoh commented Mar 14, 2025

nakkoh commented Mar 12, 2025 •

edited

Loading