Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

umgbhalla · 2022-10-20T15:42:28Z

Describe the bug
I have noticed an issue on opentelemetry http collector port , that it gives StatusCode.UNAVAILABLE when sending traces

Steps to reproduce
Setup docker compose or k8s setup for opentemetry collector , ( i have confirmed this on both k8s and docker compose ) and use this repo to produce traces, (edit ./src/helpers/tracing/index.ts to change the endpoint if neccesary)

What did you expect to see?
no error for status code and traces being collected , as otlp over grpc is working

What did you see instead?
StatusCode.UNAVAILABLE only on otlp http

What version did you use?
Version: 0.60.0

What config did you use?
docker-compose.yaml

version: "2.4"

services:
  otel-collector:
    container_name: otel-collector
    image: otel/opentelemetry-collector:0.60.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    # user: root # required for reading docker container logs
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=otel-host,os.type=linux
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension
    restart: on-failure
    networks:
      - api-dockernet

networks:
  api-dockernet:
    driver: bridge

otel-collector-config.yaml

receivers:
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      thrift_compact:
        endpoint: 0.0.0.0:6831
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - http://*
            - https://*
  zipkin:
       endpoint: 0.0.0.0:9411


processors:
  batch:
    send_batch_size: 4000
    send_batch_max_size: 4000
    timeout: 10s
  # If set to null, will be overridden with values based on k8s resource limits
  memory_limiter: null

exporters:
  otlp:
    endpoint: '<redacted>:80'
    tls:
      insecure: true
    sending_queue:
      queue_size: 1000000
  prometheusremotewrite:
    endpoint: 'http://<redacted>/write'
    tls:
      insecure: true


service:
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      exporters: [ otlp]
      processors: [batch]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

Environment
OS: any

Additional context
this issue is only happening on otlp http and not on otlp grpc

The text was updated successfully, but these errors were encountered:

adityaraibytelearn · 2022-10-28T06:39:21Z

Is this resolved. I can see the same issue while using grpc.

benjamingorman · 2023-01-12T17:10:08Z

I'm also seeing this over both grpc and http.

2023-01-12 17:08:10,079 WARNING opentelemetry.exporter.otlp.proto.grpc.exporter /usr/local/lib/python3.8/dist-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py:356   Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 16s.

I'm running the jaeger all in one image like this:

docker run --name jaeger   -e COLLECTOR_OTLP_ENABLED=true -e DJAEGER_AGENT_HOST=0.0.0.0  -p 16686:16686   -p 4317:4317   -p 4318:4318  jaegertracing/all-in-one:1.35

h4ckroot · 2023-02-21T13:28:26Z

I had a similar issue, and I found that this error will emit if your application cannot reach the collector. This could happen if you are running the application and the collector on two different networks (or on two different docker-compose files that do not share the same network).

I hope this helps!.

charliebarber · 2023-02-26T18:38:01Z

I am also getting this issue in a docker container between a instrumented Python app and the collector. They are on the same network with the bridge as a driver. Can't seem to fix it.

LronDC · 2023-04-04T03:06:14Z

May I ask why this issue has been closed?

gilbertobr · 2023-04-18T19:13:13Z

I am also having the same problem.

Script template used:

import logging

from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

logging.basicConfig(level=logging.DEBUG)

logger_provider = LoggerProvider(
    resource=Resource.create(
        {
            "service.name": "shoppingcart",
            "service.instance.id": "instance-12",
        }
    ),
)
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint="grpc.otel-collector.my.domain.io:80", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)

# Log directly
logging.info("Jackdaws love my big sphinx of quartz.")

# Create different namespaced loggers
logger1 = logging.getLogger("myapp.area1")
logger2 = logging.getLogger("myapp.area2")

logger1.debug("Quick zephyrs blow, vexing daft Jim.")
logger1.info("How quickly daft jumping zebras vex.")
logger2.warning("Jail zesty vixen who grabbed pay from quack.")
logger2.error("The five boxing wizards jump quickly.")


# Trace context correlation
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
    # Do something
    logger2.error("Hyderabad, we have a major problem.")

logger_provider.shutdown()

gilbertobr · 2023-04-18T20:07:15Z

I noticed that in nginx (proxy)
returns 400

 "PRI * HTTP/2.0" 400 150 "-" "-" 0 5.001 [] [] - - - -

sherlockliu · 2023-04-23T07:31:00Z

Any updates about this one? sounds like haven't resolved but been closed

rodrigoazv · 2023-05-09T01:11:07Z

In my case i was using wrong name of host, because of the docker-compose, we should use the name of container, in my case

http://jaeger over http://localhost

tquach-evertz · 2023-06-13T16:09:53Z

Any updates about this one? sounds like haven't resolved but been closed

The same issue hast just happened with our application... Looks like the issue hasn't been resolved yet

john-pl · 2023-06-15T20:07:58Z

We're having the same problem. I don't feel this should be closed.

wizrds · 2023-06-21T21:22:28Z

I'm encountering the same issue as well. Running otel-collector in a docker container with the gRPC port exposed and connecting to it from a native python application. The line Transient error StatusCode.UNAVAILABLE encountered while exporting metrics, retrying in 1s. will sometimes spam the logs and other times I don't see it once. Is there anyway to hide the output at least?

menyisskov · 2023-10-01T09:27:00Z

We're having the same issue.
We run the app on k8s (docker desktop), and the all-in-one on the same laptop with the docker run command.

Any ideas what can be causing it?

chansonzhang · 2024-02-27T01:31:28Z

I run a jaeger-all-in-one.exe binary on Windows, and export span from an instrumented Sanic app, failed with error "Failed to export batch. Status code: StatusCode.UNAVAILABLE"

kevarr · 2024-08-07T01:58:01Z

The solution (using python-opentelemtry) for me was to fix my OTLPSpanExporter import. I was attempting to export gRPC spans, but was importing with:

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

Instead I needed to import:


from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

If you're exporting using http/protobuf import from opentelemetry.exporter.otlp.proto.http.trace_exporter instead.

It's a very subtle difference. I suppose I should've paid closer attention when my IDE made an import suggestion for me.

jonassteinberg1 · 2025-02-08T20:45:02Z

I'm getting this and have imported correctly.

alibabadoufu · 2025-02-15T00:38:08Z

I am getting the same errors. Same networks, can successfully ping tempo from the other containers (launched in separate docker-compose.yaml), import correctly as pointed out above. But the error persists

ysavary · 2025-02-23T16:23:18Z

I am also getting this issue in a docker container between a instrumented Python app and the collector. They are on the same network with the bridge as a driver. Can't seem to fix it.

With docker for mac, I solved this issue by allowing the opentelemetry-collector container to listen to all interfaces with:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

mdrideout · 2025-02-24T04:08:47Z

With docker for mac, I solved this issue by allowing the opentelemetry-collector container to listen to all interfaces with:

This fixed it for me. Wow 11 hours ago, thank you for being right before me haha!

Classic docker issue, services must declare their host as 0.0.0.0 instead of localhost to be exposed outside of the docker compose system.

More of my details in case it helps anyone:

python script exporter

from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
otlpProcessor = BatchSpanProcessor(OTLPSpanExporter(endpoint="localhost:4317", insecure=True))

docker-compose.yml

services:
  jaeger:
    image: jaegertracing/jaeger:2.3.0
    container_name: jaeger
    ports:
      # - "16686:16686" # Jaeger UI - uses Caddy reverse proxy
      - "4317:4317" # OTLP gRPC
      - "4318:4318" # OTLP HTTP
    volumes:
      - jaeger_badger_store:/jaeger/jaeger_badger_store # Mount the volume for BadgerDB data
      - jaeger_badger_store_archive:/jaeger/jaeger_badger_store_archive # Mount the volume for BadgerDB archive data
      - ./jaeger:/jaeger # Mount the jaeger directory to make the config files available
    command: --config /jaeger/config.yml
    networks:
      - caddy-proxy-network

config.yml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

abdullah-retorio · 2025-03-16T17:03:27Z

I am getting the same error using Grafana Alloy's otelcol.receiver.otlp Opentelemetry-Collector on 0.0.0.0:4317. I am using Envoy as a reverse proxy for the docker-compose'd Alloy instance on Azure VM.

This container is called from opentelemetry.exporter.otlp.proto.grpc.exporter on Python 3.12.

Do you have any idea or solutions about the reason of this?

Symbolk · 2025-03-17T09:46:52Z

I had this issue too, since I can access UI via localhost but not the default 0.0.0.0, I glanced at Clash and disabled the global proxy, then it works!

naisanzaa · 2025-03-18T05:14:57Z

Issue +1

seanreed1111 · 2025-04-01T01:52:36Z

TLDR : I had this issue because of the auto-instrumentation doing too much in the background.

First, I changed my default receiver ports to http/protobuf 4319 and grpc 4320. No go, still saw errors at grpc 4317!!
At this point I was sure it is the zero-code configuration that is causing my problem (python).
So, then removed the opentelemetry-instrument call before calling my (python) app.
Presto! The issue went away.

vazir · 2025-04-19T09:49:43Z

Though my case is different - client side of otel does not reconnect to grpc - when http works fine. When receiving telemetry endpoint is available, it starts normally, if i restart tempo/jaeger/etc, so python otel losses connection, it will never restore. It will continue reporting transient error increasing backoff. When switching to the http - error gone, it does fail the trace, but reconnects ok when endpoint becomes available. With golang otel there is no issue, golang implementation restores connection to grpc ok.

temple · 2025-04-22T23:05:22Z

Same issue

umgbhalla added the bug Something isn't working label Oct 20, 2022

umgbhalla closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2023

gilbertobr mentioned this issue Apr 18, 2023

Transient error StatusCode.UNAVAILABLE encountered while exporting logs open-telemetry/opentelemetry-python#3261

Closed

tmeckel mentioned this issue Feb 27, 2024

[tracking] Jaeger v2 based on OpenTelemetry collector jaegertracing/jaeger#4843

Open

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

umgbhalla commented Oct 20, 2022 •

edited

Loading

adityaraibytelearn commented Oct 28, 2022

benjamingorman commented Jan 12, 2023

h4ckroot commented Feb 21, 2023

charliebarber commented Feb 26, 2023

LronDC commented Apr 4, 2023

gilbertobr commented Apr 18, 2023

gilbertobr commented Apr 18, 2023

sherlockliu commented Apr 23, 2023

rodrigoazv commented May 9, 2023

tquach-evertz commented Jun 13, 2023

john-pl commented Jun 15, 2023

wizrds commented Jun 21, 2023 •

edited

Loading

menyisskov commented Oct 1, 2023

chansonzhang commented Feb 27, 2024

kevarr commented Aug 7, 2024

jonassteinberg1 commented Feb 8, 2025

alibabadoufu commented Feb 15, 2025

ysavary commented Feb 23, 2025

mdrideout commented Feb 24, 2025

abdullah-retorio commented Mar 16, 2025 •

edited

Loading

Symbolk commented Mar 17, 2025 •

edited

Loading

naisanzaa commented Mar 18, 2025

seanreed1111 commented Apr 1, 2025

vazir commented Apr 19, 2025 •

edited

Loading

temple commented Apr 22, 2025

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

Comments

umgbhalla commented Oct 20, 2022 • edited Loading

adityaraibytelearn commented Oct 28, 2022

benjamingorman commented Jan 12, 2023

h4ckroot commented Feb 21, 2023

charliebarber commented Feb 26, 2023

LronDC commented Apr 4, 2023

gilbertobr commented Apr 18, 2023

gilbertobr commented Apr 18, 2023

sherlockliu commented Apr 23, 2023

rodrigoazv commented May 9, 2023

tquach-evertz commented Jun 13, 2023

john-pl commented Jun 15, 2023

wizrds commented Jun 21, 2023 • edited Loading

menyisskov commented Oct 1, 2023

chansonzhang commented Feb 27, 2024

kevarr commented Aug 7, 2024

jonassteinberg1 commented Feb 8, 2025

alibabadoufu commented Feb 15, 2025

ysavary commented Feb 23, 2025

mdrideout commented Feb 24, 2025

abdullah-retorio commented Mar 16, 2025 • edited Loading

Symbolk commented Mar 17, 2025 • edited Loading

naisanzaa commented Mar 18, 2025

seanreed1111 commented Apr 1, 2025

vazir commented Apr 19, 2025 • edited Loading

temple commented Apr 22, 2025

umgbhalla commented Oct 20, 2022 •

edited

Loading

wizrds commented Jun 21, 2023 •

edited

Loading

abdullah-retorio commented Mar 16, 2025 •

edited

Loading

Symbolk commented Mar 17, 2025 •

edited

Loading

vazir commented Apr 19, 2025 •

edited

Loading