Skip to content

[httpcheck] checks do not fail if DNS record changes to bad value #40120

Open
@diurnalist

Description

@diurnalist

Component(s)

receiver/httpcheckreceiver

What happened?

Description

The httpcheck receiver creates HTTP clients in the start function and re-uses them for each scrape. By default, settings such as disable_keep_alive are set to their false 0-values. However, this can hide a set of failure modes around DNS resolution and intermittent network issues that prevent creation of new connections.

We noticed this when a bad DNS change went out, but all of our httpchecks continued to return 2xx metrics.

Steps to Reproduce

Run a httpcheck with default configuration and let it start collecting metrics. Change the DNS resolution for the target host while the collector is running.

Expected Result

The collector fails to scrape the endpoint and stops returning successful response codes.

Actual Result

The collector continues to return the same codes as it had prior, for as long as the collector is up and the target IP/port is open and responding consistently.

Collector version

v0.125.0

Environment information

Environment

OS: Ubuntu 22.04
Compiler(if manually compiled): go 1.24

OpenTelemetry Collector configuration

exporters:
  debug:
    verbosity: normal
    sampling_initial: 2
    sampling_thereafter: 10000

receivers:
  httpcheck:
    collection_interval: 10s
    targets:
      - endpoint: https://opentelemetry.io
        method: GET

service:
  pipelines:
    metrics/http:
      receivers:
        - httpcheck
      exporters:
        - debug

Log output

2025-05-16T08:47:34.787-0700    info    Metrics {"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 2, "data points": 6}
2025-05-16T08:47:34.787-0700    info    httpcheck.duration{http.url=https://opentelemetry.io} 209
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=4xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=5xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=1xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=2xx} 1
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=3xx} 0

# I edited /etc/hosts to point opentelemetry.io to a bad IP address in between the first and second batch of metrics reported.

        {"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}
2025-05-16T08:47:44.791-0700    info    Metrics {"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics", "resource metrics": 1, "metrics": 2, "data points": 6}
2025-05-16T08:47:44.791-0700    info    httpcheck.duration{http.url=https://opentelemetry.io} 40
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=3xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=4xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=5xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=1xx} 0
httpcheck.status{http.url=https://opentelemetry.io,http.status_code=200,http.method=GET,http.status_class=2xx} 1
        {"otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "metrics"}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions