Skip to content

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
umgbhalla opened this issue Oct 20, 2022 · 25 comments
Closed
Labels
bug Something isn't working

Comments

@umgbhalla
Copy link

umgbhalla commented Oct 20, 2022

Describe the bug
I have noticed an issue on opentelemetry http collector port , that it gives StatusCode.UNAVAILABLE when sending traces

Steps to reproduce
Setup docker compose or k8s setup for opentemetry collector , ( i have confirmed this on both k8s and docker compose ) and use this repo to produce traces, (edit ./src/helpers/tracing/index.ts to change the endpoint if neccesary)

What did you expect to see?
no error for status code and traces being collected , as otlp over grpc is working

What did you see instead?
StatusCode.UNAVAILABLE only on otlp http

What version did you use?
Version: 0.60.0

What config did you use?
docker-compose.yaml

version: "2.4"

services:
  otel-collector:
    container_name: otel-collector
    image: otel/opentelemetry-collector:0.60.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    # user: root # required for reading docker container logs
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=otel-host,os.type=linux
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension
    restart: on-failure
    networks:
      - api-dockernet

networks:
  api-dockernet:
    driver: bridge

otel-collector-config.yaml

receivers:
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      thrift_compact:
        endpoint: 0.0.0.0:6831
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - http://*
            - https://*
  zipkin:
       endpoint: 0.0.0.0:9411


processors:
  batch:
    send_batch_size: 4000
    send_batch_max_size: 4000
    timeout: 10s
  # If set to null, will be overridden with values based on k8s resource limits
  memory_limiter: null

exporters:
  otlp:
    endpoint: '<redacted>:80'
    tls:
      insecure: true
    sending_queue:
      queue_size: 1000000
  prometheusremotewrite:
    endpoint: 'http://<redacted>/write'
    tls:
      insecure: true


service:
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      exporters: [ otlp]
      processors: [batch]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

Environment
OS: any

Additional context
this issue is only happening on otlp http and not on otlp grpc

@umgbhalla umgbhalla added the bug Something isn't working label Oct 20, 2022
@adityaraibytelearn
Copy link

Is this resolved. I can see the same issue while using grpc.

@benjamingorman
Copy link

I'm also seeing this over both grpc and http.

2023-01-12 17:08:10,079 WARNING opentelemetry.exporter.otlp.proto.grpc.exporter /usr/local/lib/python3.8/dist-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py:356   Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 16s.

I'm running the jaeger all in one image like this:

docker run --name jaeger   -e COLLECTOR_OTLP_ENABLED=true -e DJAEGER_AGENT_HOST=0.0.0.0  -p 16686:16686   -p 4317:4317   -p 4318:4318  jaegertracing/all-in-one:1.35

@h4ckroot
Copy link

I had a similar issue, and I found that this error will emit if your application cannot reach the collector. This could happen if you are running the application and the collector on two different networks (or on two different docker-compose files that do not share the same network).

I hope this helps!.

@umgbhalla umgbhalla closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2023
@charliebarber
Copy link

I am also getting this issue in a docker container between a instrumented Python app and the collector. They are on the same network with the bridge as a driver. Can't seem to fix it.

@LronDC
Copy link

LronDC commented Apr 4, 2023

May I ask why this issue has been closed?

@gilbertobr
Copy link

I am also having the same problem.

Script template used:

import logging

from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

logging.basicConfig(level=logging.DEBUG)

logger_provider = LoggerProvider(
    resource=Resource.create(
        {
            "service.name": "shoppingcart",
            "service.instance.id": "instance-12",
        }
    ),
)
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint="grpc.otel-collector.my.domain.io:80", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)

# Log directly
logging.info("Jackdaws love my big sphinx of quartz.")

# Create different namespaced loggers
logger1 = logging.getLogger("myapp.area1")
logger2 = logging.getLogger("myapp.area2")

logger1.debug("Quick zephyrs blow, vexing daft Jim.")
logger1.info("How quickly daft jumping zebras vex.")
logger2.warning("Jail zesty vixen who grabbed pay from quack.")
logger2.error("The five boxing wizards jump quickly.")


# Trace context correlation
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
    # Do something
    logger2.error("Hyderabad, we have a major problem.")

logger_provider.shutdown()

@gilbertobr
Copy link

I noticed that in nginx (proxy)
returns 400

 "PRI * HTTP/2.0" 400 150 "-" "-" 0 5.001 [] [] - - - - 

@sherlockliu
Copy link

Any updates about this one? sounds like haven't resolved but been closed

@rodrigoazv
Copy link

In my case i was using wrong name of host, because of the docker-compose, we should use the name of container, in my case

http://jaeger over http://localhost

@tquach-evertz
Copy link

Any updates about this one? sounds like haven't resolved but been closed

The same issue hast just happened with our application... Looks like the issue hasn't been resolved yet

@john-pl
Copy link

john-pl commented Jun 15, 2023

We're having the same problem. I don't feel this should be closed.

@wizrds
Copy link

wizrds commented Jun 21, 2023

I'm encountering the same issue as well. Running otel-collector in a docker container with the gRPC port exposed and connecting to it from a native python application. The line Transient error StatusCode.UNAVAILABLE encountered while exporting metrics, retrying in 1s. will sometimes spam the logs and other times I don't see it once. Is there anyway to hide the output at least?

@menyisskov
Copy link

We're having the same issue.
We run the app on k8s (docker desktop), and the all-in-one on the same laptop with the docker run command.

Any ideas what can be causing it?

@chansonzhang
Copy link

I run a jaeger-all-in-one.exe binary on Windows, and export span from an instrumented Sanic app, failed with error "Failed to export batch. Status code: StatusCode.UNAVAILABLE"

@kevarr
Copy link

kevarr commented Aug 7, 2024

The solution (using python-opentelemtry) for me was to fix my OTLPSpanExporter import. I was attempting to export gRPC spans, but was importing with:

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

Instead I needed to import:


from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

If you're exporting using http/protobuf import from opentelemetry.exporter.otlp.proto.http.trace_exporter instead.

It's a very subtle difference. I suppose I should've paid closer attention when my IDE made an import suggestion for me.

@jonassteinberg1
Copy link

I'm getting this and have imported correctly.

@alibabadoufu
Copy link

I am getting the same errors. Same networks, can successfully ping tempo from the other containers (launched in separate docker-compose.yaml), import correctly as pointed out above. But the error persists

@ysavary
Copy link

ysavary commented Feb 23, 2025

I am also getting this issue in a docker container between a instrumented Python app and the collector. They are on the same network with the bridge as a driver. Can't seem to fix it.

With docker for mac, I solved this issue by allowing the opentelemetry-collector container to listen to all interfaces with:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

@mdrideout
Copy link

With docker for mac, I solved this issue by allowing the opentelemetry-collector container to listen to all interfaces with:

This fixed it for me. Wow 11 hours ago, thank you for being right before me haha!

Classic docker issue, services must declare their host as 0.0.0.0 instead of localhost to be exposed outside of the docker compose system.

More of my details in case it helps anyone:

python script exporter

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
otlpProcessor = BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317", insecure=True))

docker-compose.yml

services:
  jaeger:
    image: jaegertracing/jaeger:2.3.0
    container_name: jaeger
    ports:
      # - "16686:16686" # Jaeger UI - uses Caddy reverse proxy
      - "4317:4317" # OTLP gRPC
      - "4318:4318" # OTLP HTTP
    volumes:
      - jaeger_badger_store:/jaeger/jaeger_badger_store # Mount the volume for BadgerDB data
      - jaeger_badger_store_archive:/jaeger/jaeger_badger_store_archive # Mount the volume for BadgerDB archive data
      - ./jaeger:/jaeger # Mount the jaeger directory to make the config files available
    command: --config /jaeger/config.yml
    networks:
      - caddy-proxy-network

config.yml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

@abdullah-retorio
Copy link

abdullah-retorio commented Mar 16, 2025

I am getting the same error using Grafana Alloy's otelcol.receiver.otlp Opentelemetry-Collector on 0.0.0.0:4317. I am using Envoy as a reverse proxy for the docker-compose'd Alloy instance on Azure VM.

This container is called from opentelemetry.exporter.otlp.proto.grpc.exporter on Python 3.12.

Do you have any idea or solutions about the reason of this?

@Symbolk
Copy link

Symbolk commented Mar 17, 2025

I had this issue too, since I can access UI via localhost but not the default 0.0.0.0, I glanced at Clash and disabled the global proxy, then it works!

@naisanzaa
Copy link

Issue +1

@seanreed1111
Copy link

TLDR : I had this issue because of the auto-instrumentation doing too much in the background.

First, I changed my default receiver ports to http/protobuf 4319 and grpc 4320. No go, still saw errors at grpc 4317!!
At this point I was sure it is the zero-code configuration that is causing my problem (python).
So, then removed the opentelemetry-instrument call before calling my (python) app.
Presto! The issue went away.

@vazir
Copy link

vazir commented Apr 19, 2025

Though my case is different - client side of otel does not reconnect to grpc - when http works fine. When receiving telemetry endpoint is available, it starts normally, if i restart tempo/jaeger/etc, so python otel losses connection, it will never restore. It will continue reporting transient error increasing backoff. When switching to the http - error gone, it does fail the trace, but reconnects ok when endpoint becomes available. With golang otel there is no issue, golang implementation restores connection to grpc ok.

@temple
Copy link

temple commented Apr 22, 2025

Same issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests