statsd receiver - error - Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X}) #31169

ceastman-r7 · 2024-02-09T17:47:00Z

Component(s)

receiver/statsd

What happened?

Description

Using image: opentelemetry-collector-contrib:0.92.0 deployed to AWS EKS version 1.29 cluster.

When I send the following to test the statsd receiver I see the following in the logs of the otel deployment agent collector:

Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X})

Steps to Reproduce

Deploy otel collector agent with this config:

and send the same from the documentation: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/statsdreceiver/README.md#send-statsd-message-into-the-receiver

Expected Result

to have the metric, value and tag show up in prometheus

Actual Result

error received in logs

Collector version

0.92.0 and 0.93.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

apiVersion: v1
data:
  relay: |
    exporters:
      debug: {}
      logging: {}
      prometheusremotewrite:
        auth:
          authenticator: basicauth/metrics
        endpoint: https://redact/api/prom/push
        resource_to_telemetry_conversion:
          enabled: true
        target_info:
          enabled: false
    extensions:
      basicauth/metrics:
        client_auth:
          password: redact
          username: "redact"
      basicauth/traces:
        client_auth:
          password: redact
          username: "redact"
      health_check: {}
    processors:
      batch: {}
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
      resource/dropattribute:
        attributes:
        - action: delete
          key: container.id
        - action: delete
          key: k8s.cronjob.uid
        - action: delete
          key: k8s.daemonset.uid
        - action: delete
          key: k8s.deployment.uid
        - action: delete
          key: k8s.hpa.uid
        - action: delete
          key: k8s.job.uid
        - action: delete
          key: k8s.namespace.uid
        - action: delete
          key: k8s.node.uid
        - action: delete
          key: k8s.pod.uid
        - action: delete
          key: k8s.replicaset.uid
        - action: delete
          key: k8s.statefulset.uid
    receivers:
      jaeger:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:14250
          thrift_compact:
            endpoint: ${env:MY_POD_IP}:6831
          thrift_http:
            endpoint: ${env:MY_POD_IP}:14268
      otlp:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:4317
          http:
            endpoint: ${env:MY_POD_IP}:4318
      prometheus:
        config:
          scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
            - targets:
              - ${env:MY_POD_IP}:8888
      statsd:
        aggregation_interval: 5s
        enable_metric_type: true
        enable_simple_tags: true
        endpoint: ${env:MY_POD_IP}:8125
        is_monotonic_counter: true
        timer_histogram_mapping:
        - observer_type: summary
          statsd_type: histogram
        - observer_type: summary
          statsd_type: timing
      statsd/2:
        aggregation_interval: 70s
        enable_metric_type: true
        endpoint: ${env:MY_POD_IP}:8127
        is_monotonic_counter: false
        timer_histogram_mapping:
        - observer_type: gauge
          statsd_type: histogram
        - histogram:
            max_size: 100
          observer_type: histogram
          statsd_type: timing
        - histogram:
            max_size: 50
          observer_type: histogram
          statsd_type: distribution
      zipkin:
        endpoint: ${env:MY_POD_IP}:9411
    service:
      extensions:
      - basicauth/metrics
      - basicauth/traces
      - health_check
      pipelines:
        logs:
          exporters:
          - debug
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
        metrics:
          exporters:
          - logging
          processors:
          - memory_limiter
          - batch
          receivers:
          - statsd
        traces:
          exporters:
          - debug
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
          - jaeger
          - zipkin
      telemetry:
        logs:
          level: debug
        metrics:
          address: ${env:MY_POD_IP}:8888

Log output

otel-deploy-agent-687cc74658-jgpv2 opentelemetry-collector 2024-02-09T17:33:05.344Z	debug	[email protected]/reporter.go:42	Error aggregating metric%!(EXTRA zapcore.Field={error 26 0  invalid message format: X})	{"kind": "receiver", "name": "statsd", "data_type": "metrics"}

Additional context

No response

github-actions · 2024-02-09T17:47:15Z

Pinging code owners:

receiver/statsd: @jmacd @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax · 2024-02-15T04:46:05Z

@ceastman-r7, thanks for reporting.

If you can help with investigating and fixing the issue, that whould be great. For now, I'm adding the "help wanted" label. I can't look into this at this moment.

jmacd · 2024-02-27T16:15:06Z

I see there is a small problem in the error reporting here, the %!(EXTRA ...) string happens because of a mis-use of an OnDebugf method:

				if err := r.parser.Aggregate(metric.Raw, metric.Addr); err != nil {
					r.reporter.OnDebugf("Error aggregating metric", zap.Error(err))
				}

This bug aside, the receiver is trying to tell you that the data is invalid. The line appears to be simply X according to the error message printed, which is not valid statsd syntax.

I think we want to change that call to read

r.reporter.OnDebugf("Error aggregating metric: %w", zap.Error(err))

which will change the message, but we should look into the cause of the error as a separate matter. Do you know what is sending this statsd data?

ceastman-r7 · 2024-03-14T17:30:09Z

@jmacd I am using this echo statement:
echo "test.metric:42|c|#cluster:kubeplay-staging-5" | nc -w 1 -u -4 -v otel-deploy-agent.platform-delivery 8125

ceastman-r7 · 2024-04-01T17:58:35Z

@jmacd Any update or additional information you need? I found that I can't use the older version of the collector due to a bug in the prometheus receiver.

ceastman-r7 · 2024-04-26T18:01:08Z

I think this is fixed now, image tag: 0.98.0

crobert-1 · 2024-04-26T18:04:29Z

Thanks for following up @ceastman-r7, glad to hear it's working again!

ceastman-r7 added bug Something isn't working needs triage New item requiring triage labels Feb 9, 2024

github-actions bot added the receiver/statsd statsd related issues label Feb 9, 2024

github-actions bot mentioned this issue Feb 13, 2024

Weekly Report: 2024-02-06 - 2024-02-13 #31192

Closed

dmitryax added help wanted Extra attention is needed and removed needs triage New item requiring triage labels Feb 15, 2024

crobert-1 closed this as completed Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statsd receiver - error - Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X}) #31169

statsd receiver - error - Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X}) #31169

ceastman-r7 commented Feb 9, 2024 •

edited

Loading

github-actions bot commented Feb 9, 2024

dmitryax commented Feb 15, 2024 •

edited

Loading

jmacd commented Feb 27, 2024

ceastman-r7 commented Mar 14, 2024

ceastman-r7 commented Apr 1, 2024

ceastman-r7 commented Apr 26, 2024

crobert-1 commented Apr 26, 2024

statsd receiver - error - Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X}) #31169

statsd receiver - error - Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X}) #31169

Comments

ceastman-r7 commented Feb 9, 2024 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Feb 9, 2024

dmitryax commented Feb 15, 2024 • edited Loading

jmacd commented Feb 27, 2024

ceastman-r7 commented Mar 14, 2024

ceastman-r7 commented Apr 1, 2024

ceastman-r7 commented Apr 26, 2024

crobert-1 commented Apr 26, 2024

ceastman-r7 commented Feb 9, 2024 •

edited

Loading

dmitryax commented Feb 15, 2024 •

edited

Loading