Description
Component(s)
processor/metricstransform
What happened?
Description
I want to use the metricstransform processor to aggregate counts of multiple pods of a certain application metrics. I want prometheus which fetches data from otel-collector to only show a certain labelset. I'm using default metrics from springboot Actuator that would of course be different in a real scenario but this is just used as a minimal example 😀
Steps to Reproduce
otel-collector values.yaml: see below
excerpt of output of one of the application pods /metrics uri:
# HELP http_server_requests_seconds # TYPE http_server_requests_seconds summary http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**"} 1
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**"} 0.005953076
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/"} 6
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/"} 0.00819007
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health"} 167
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health"} 1.289030503
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus"} 887
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus"} 0.310482366
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/metrics"} 74
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/metrics"} 0.218680223
I checked the result for the http_server_requests_seconds_count in prometheus where I scraped metrics using the following scrape_config:
- job_name: otel-collector
scrape_interval: 30s
static_configs:
- targets:
- otel-collector-opentelemetry-collector:8090
relabel_configs:
- source_labels: [ exported_job ]
target_label: job
Expected Result
my expected result would be that prometheus presents data like this as a sum across all other labels than the ones specified in the aggregate_labels transform step:
http_server_requests_seconds_count{instance="otel-collector-opentelemetry-collector:8090", job="metrics-demo", method="GET", status="200", uri="/actuator/health"} 1769
Actual Result
prometheus shows the following data + labels for two different pods:
http_server_requests_seconds_count{error="none", exception="none", exported_instance="10.244.0.181:8080", instance="otel-collector-opentelemetry-collector:8090", job="metrics-demo", method="GET", outcome="SUCCESS", pod_name="metrics-demo-springboot-metrics-7694578bb8-kr5rg", status="200", uri="/actuator/health"} 887
http_server_requests_seconds_count{error="none", exception="none", exported_instance="10.244.0.187:8080", instance="otel-collector-opentelemetry-collector:8090", job="metrics-demo", method="GET", outcome="SUCCESS", pod_name="metrics-demo-springboot-metrics-7694578bb8-j8hp6", status="200", uri="/actuator/health"} 882
Collector version
v0.118.0
Environment information
Environment
OS: Fedora running minikube locally
Compiler(if manually compiled): n/a
OpenTelemetry Collector configuration
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: metrics-demo
scrape_interval: 30s
metrics_path: /actuator/prometheus
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- metrics-demo
relabel_configs:
- source_labels: [ __meta_kubernetes_pod_label_app_kubernetes_io_name ]
action: keep
regex: springboot-metrics
- source_labels: [ __meta_kubernetes_pod_name ]
target_label: pod_name
processors:
metricstransform:
transforms:
- include: "http_server_requests_seconds_count"
action: "update"
operations:
- action: "aggregate_labels"
label_set: [ exported_job, method, status, uri ]
aggregation_type: "sum"
exporters:
prometheus:
endpoint: ${env:MY_POD_IP}:8090
enable_open_metrics: true
debug:
verbosity: detailed
service:
pipelines:
metrics:
exporters:
- prometheus
processors:
- metricstransform
receivers:
- prometheus
ports:
prometheus:
enabled: true
containerPort: 8090
servicePort: 8090
protocol: TCP
Log output
n/a - no errors thrown
Additional context
This might likely be a layer 8 problem but i'm not getting any further on this with my limited Go knowledge and the documentation unfortunately.
I tested other processors like the filter to see if those would work like I expect them to based on the documentation and there it works like I would expect.
If it's a layer 8 problem and you help me to figure it out I would also like to add that to the processors Documentation and create a PR for that addition later on 😄