Skip to content

statsd receiver - error - Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X}) #31169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ceastman-r7 opened this issue Feb 9, 2024 · 7 comments
Labels
bug Something isn't working help wanted Extra attention is needed receiver/statsd statsd related issues

Comments

@ceastman-r7
Copy link

ceastman-r7 commented Feb 9, 2024

Component(s)

receiver/statsd

What happened?

Description

Using image: opentelemetry-collector-contrib:0.92.0 deployed to AWS EKS version 1.29 cluster.

When I send the following to test the statsd receiver I see the following in the logs of the otel deployment agent collector:

Error aggregating metric%!(EXTRA zapcore.Field={error 26 0 invalid message format: X})

Steps to Reproduce

Deploy otel collector agent with this config:

and send the same from the documentation: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/statsdreceiver/README.md#send-statsd-message-into-the-receiver

Expected Result

to have the metric, value and tag show up in prometheus

Actual Result

error received in logs

Collector version

0.92.0 and 0.93.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

apiVersion: v1
data:
  relay: |
    exporters:
      debug: {}
      logging: {}
      prometheusremotewrite:
        auth:
          authenticator: basicauth/metrics
        endpoint: https://redact/api/prom/push
        resource_to_telemetry_conversion:
          enabled: true
        target_info:
          enabled: false
    extensions:
      basicauth/metrics:
        client_auth:
          password: redact
          username: "redact"
      basicauth/traces:
        client_auth:
          password: redact
          username: "redact"
      health_check: {}
    processors:
      batch: {}
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
      resource/dropattribute:
        attributes:
        - action: delete
          key: container.id
        - action: delete
          key: k8s.cronjob.uid
        - action: delete
          key: k8s.daemonset.uid
        - action: delete
          key: k8s.deployment.uid
        - action: delete
          key: k8s.hpa.uid
        - action: delete
          key: k8s.job.uid
        - action: delete
          key: k8s.namespace.uid
        - action: delete
          key: k8s.node.uid
        - action: delete
          key: k8s.pod.uid
        - action: delete
          key: k8s.replicaset.uid
        - action: delete
          key: k8s.statefulset.uid
    receivers:
      jaeger:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:14250
          thrift_compact:
            endpoint: ${env:MY_POD_IP}:6831
          thrift_http:
            endpoint: ${env:MY_POD_IP}:14268
      otlp:
        protocols:
          grpc:
            endpoint: ${env:MY_POD_IP}:4317
          http:
            endpoint: ${env:MY_POD_IP}:4318
      prometheus:
        config:
          scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
            - targets:
              - ${env:MY_POD_IP}:8888
      statsd:
        aggregation_interval: 5s
        enable_metric_type: true
        enable_simple_tags: true
        endpoint: ${env:MY_POD_IP}:8125
        is_monotonic_counter: true
        timer_histogram_mapping:
        - observer_type: summary
          statsd_type: histogram
        - observer_type: summary
          statsd_type: timing
      statsd/2:
        aggregation_interval: 70s
        enable_metric_type: true
        endpoint: ${env:MY_POD_IP}:8127
        is_monotonic_counter: false
        timer_histogram_mapping:
        - observer_type: gauge
          statsd_type: histogram
        - histogram:
            max_size: 100
          observer_type: histogram
          statsd_type: timing
        - histogram:
            max_size: 50
          observer_type: histogram
          statsd_type: distribution
      zipkin:
        endpoint: ${env:MY_POD_IP}:9411
    service:
      extensions:
      - basicauth/metrics
      - basicauth/traces
      - health_check
      pipelines:
        logs:
          exporters:
          - debug
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
        metrics:
          exporters:
          - logging
          processors:
          - memory_limiter
          - batch
          receivers:
          - statsd
        traces:
          exporters:
          - debug
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
          - jaeger
          - zipkin
      telemetry:
        logs:
          level: debug
        metrics:
          address: ${env:MY_POD_IP}:8888

Log output

otel-deploy-agent-687cc74658-jgpv2 opentelemetry-collector 2024-02-09T17:33:05.344Z	debug	[email protected]/reporter.go:42	Error aggregating metric%!(EXTRA zapcore.Field={error 26 0  invalid message format: X})	{"kind": "receiver", "name": "statsd", "data_type": "metrics"}

Additional context

No response

@ceastman-r7 ceastman-r7 added bug Something isn't working needs triage New item requiring triage labels Feb 9, 2024
@github-actions github-actions bot added the receiver/statsd statsd related issues label Feb 9, 2024
Copy link
Contributor

github-actions bot commented Feb 9, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dmitryax
Copy link
Member

dmitryax commented Feb 15, 2024

@ceastman-r7, thanks for reporting.

If you can help with investigating and fixing the issue, that whould be great. For now, I'm adding the "help wanted" label. I can't look into this at this moment.

@dmitryax dmitryax added help wanted Extra attention is needed and removed needs triage New item requiring triage labels Feb 15, 2024
@jmacd
Copy link
Contributor

jmacd commented Feb 27, 2024

I see there is a small problem in the error reporting here, the %!(EXTRA ...) string happens because of a mis-use of an OnDebugf method:

				if err := r.parser.Aggregate(metric.Raw, metric.Addr); err != nil {
					r.reporter.OnDebugf("Error aggregating metric", zap.Error(err))
				}

This bug aside, the receiver is trying to tell you that the data is invalid. The line appears to be simply X according to the error message printed, which is not valid statsd syntax.

I think we want to change that call to read

r.reporter.OnDebugf("Error aggregating metric: %w", zap.Error(err))

which will change the message, but we should look into the cause of the error as a separate matter. Do you know what is sending this statsd data?

@ceastman-r7
Copy link
Author

@jmacd I am using this echo statement:
echo "test.metric:42|c|#cluster:kubeplay-staging-5" | nc -w 1 -u -4 -v otel-deploy-agent.platform-delivery 8125

@ceastman-r7
Copy link
Author

@jmacd Any update or additional information you need? I found that I can't use the older version of the collector due to a bug in the prometheus receiver.

@ceastman-r7
Copy link
Author

I think this is fixed now, image tag: 0.98.0

@crobert-1
Copy link
Member

Thanks for following up @ceastman-r7, glad to hear it's working again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed receiver/statsd statsd related issues
Projects
None yet
Development

No branches or pull requests

4 participants