Description
Component(s)
What happened?
Description
The httpcheck receiver creates HTTP clients in the start
function and re-uses them for each scrape. By default, settings such as disable_keep_alive
are set to their false 0-values. However, this can hide a set of failure modes around DNS resolution and intermittent network issues that prevent creation of new connections.
We noticed this when a bad DNS change went out, but all of our httpchecks continued to return 2xx metrics.
Steps to Reproduce
Run a httpcheck with default configuration and let it start collecting metrics. Change the DNS resolution for the target host while the collector is running.
Expected Result
The collector fails to scrape the endpoint and stops returning successful response codes.
Actual Result
The collector continues to return the same codes as it had prior, for as long as the collector is up and the target IP/port is open and responding consistently.
Collector version
v0.125.0
Environment information
Environment
OS: Ubuntu 22.04
Compiler(if manually compiled): go 1.24
OpenTelemetry Collector configuration
exporters:
debug:
verbosity: normal
sampling_initial: 2
sampling_thereafter: 10000
receivers:
httpcheck:
collection_interval: 10s
targets:
- endpoint: https://opentelemetry.io
method: GET
service:
pipelines:
metrics/http:
receivers:
- httpcheck
exporters:
- debug
Log output
2025-05-16T08:47:34.787-0700 info Metrics {"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 2, "data points": 6}
2025-05-16T08:47:34.787-0700 info httpcheck.duration{http.url=https://opentelemetry.io} 209
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=4xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=5xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=1xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=2xx} 1
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=3xx} 0
# I edited /etc/hosts to point opentelemetry.io to a bad IP address in between the first and second batch of metrics reported.
{"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}
2025-05-16T08:47:44.791-0700 info Metrics {"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 2, "data points": 6}
2025-05-16T08:47:44.791-0700 info httpcheck.duration{http.url=https://opentelemetry.io} 40
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=3xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=4xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=5xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=1xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=2xx} 1
{"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}
Additional context
No response