Skip to content

metricstransform processor not working as expected - sum of 2 summary prometheus metrics is not created with only the configured label_set #37792

Open
@CarstenSon

Description

@CarstenSon

Component(s)

processor/metricstransform

What happened?

Description

I want to use the metricstransform processor to aggregate counts of multiple pods of a certain application metrics. I want prometheus which fetches data from otel-collector to only show a certain labelset. I'm using default metrics from springboot Actuator that would of course be different in a real scenario but this is just used as a minimal example 😀

Steps to Reproduce

otel-collector values.yaml: see below

excerpt of output of one of the application pods /metrics uri:

# HELP http_server_requests_seconds # TYPE http_server_requests_seconds summary http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**"} 1 
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**"} 0.005953076 
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/"} 6 
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/"} 0.00819007 
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health"} 167
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/health"} 1.289030503 
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus"} 887 
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/actuator/prometheus"} 0.310482366 
http_server_requests_seconds_count{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/metrics"} 74 
http_server_requests_seconds_sum{error="none",exception="none",method="GET",outcome="SUCCESS",status="200",uri="/metrics"} 0.218680223 

I checked the result for the http_server_requests_seconds_count in prometheus where I scraped metrics using the following scrape_config:

      - job_name: otel-collector
        scrape_interval: 30s
        static_configs:
          - targets:
              - otel-collector-opentelemetry-collector:8090
        relabel_configs:
          - source_labels: [ exported_job ]
            target_label: job

Expected Result

my expected result would be that prometheus presents data like this as a sum across all other labels than the ones specified in the aggregate_labels transform step:

http_server_requests_seconds_count{instance="otel-collector-opentelemetry-collector:8090", job="metrics-demo", method="GET", status="200", uri="/actuator/health"}	1769

Actual Result

prometheus shows the following data + labels for two different pods:

http_server_requests_seconds_count{error="none", exception="none", exported_instance="10.244.0.181:8080", instance="otel-collector-opentelemetry-collector:8090", job="metrics-demo", method="GET", outcome="SUCCESS", pod_name="metrics-demo-springboot-metrics-7694578bb8-kr5rg", status="200", uri="/actuator/health"}	887
http_server_requests_seconds_count{error="none", exception="none", exported_instance="10.244.0.187:8080", instance="otel-collector-opentelemetry-collector:8090", job="metrics-demo", method="GET", outcome="SUCCESS", pod_name="metrics-demo-springboot-metrics-7694578bb8-j8hp6", status="200", uri="/actuator/health"}	882

Collector version

v0.118.0

Environment information

Environment

OS: Fedora running minikube locally
Compiler(if manually compiled): n/a

OpenTelemetry Collector configuration

config:
  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: metrics-demo
            scrape_interval: 30s
            metrics_path: /actuator/prometheus
            kubernetes_sd_configs:
              - role: pod
                namespaces:
                  names:
                    - metrics-demo
            relabel_configs:
              - source_labels: [ __meta_kubernetes_pod_label_app_kubernetes_io_name ]
                action: keep
                regex: springboot-metrics
              - source_labels: [ __meta_kubernetes_pod_name ]
                target_label: pod_name


  processors:
    metricstransform:
      transforms:
        - include: "http_server_requests_seconds_count"
          action: "update"
          operations:
            - action: "aggregate_labels"
              label_set: [ exported_job, method, status, uri ]
              aggregation_type: "sum"

  exporters:
    prometheus:
      endpoint: ${env:MY_POD_IP}:8090
      enable_open_metrics: true
    debug:
      verbosity: detailed

  service:
    pipelines:
      metrics:
        exporters:
          - prometheus
        processors:
          - metricstransform
        receivers:
          - prometheus

ports:
  prometheus:
    enabled: true
    containerPort: 8090
    servicePort: 8090
    protocol: TCP

Log output

n/a - no errors thrown

Additional context

This might likely be a layer 8 problem but i'm not getting any further on this with my limited Go knowledge and the documentation unfortunately.

I tested other processors like the filter to see if those would work like I expect them to based on the documentation and there it works like I would expect.

If it's a layer 8 problem and you help me to figure it out I would also like to add that to the processors Documentation and create a PR for that addition later on 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions