Skip to content

Fleet Server preventing Elasticsearch node shutdown due to persistent HTTP connections #4905

Open
@ceeeekay

Description

@ceeeekay

Version:
8.18.0

Operating System:
Ubuntu 22.04.5

Discuss Forum URL:
https://discuss.elastic.co/t/376934


Summary of the Issue:
Fleet Server maintains persistent HTTP connections to its configured Elasticsearch output nodes, which prevents those nodes from shutting down cleanly. When an Elasticsearch node is stopped, it hangs for 2–3 minutes, waiting for Fleet Server to release its connections. Restarting the Fleet Server (i.e., its elastic-agent process) allows the node to shut down immediately, confirming Fleet Server as the blocking party.

This behaviour can also be confirmed by monitoring active connections to port 9200 on the coordinating node during shutdown. Using netstat or ss shows that Fleet Server maintains open connections even as the Elasticsearch node attempts to stop.

This breaks high-availability expectations: since even a single running Fleet Server will prevent node shutdown, all Fleet Servers must be stopped before any of their Elasticsearch output nodes can shut down cleanly.

Note that other stack components (e.g., Logstash) cleanly disconnect on shutdown, suggesting Fleet Server does not correctly respond to connection termination signals from Elasticsearch.


Steps to Reproduce:

  1. Deploy Fleet Server (a single instance is sufficient).
  2. Ensure it is connected to Elasticsearch via HTTPS.
  3. Attempt to shut down any of its Elasticsearch output nodes.
  4. Observe the node hang in the stopping state for 2–3 minutes.
  5. During shutdown, run ss or netstat and observe persistent connections to port 9200 from the Fleet Server.
  6. Restart the Fleet Server while the ES node is still stopping.
  7. Observe the node shut down immediately after Fleet Server exits.

Expected Behavior:

  • Fleet Server should detect node shutdown and release its connections immediately, allowing Elasticsearch to stop cleanly.

Actual Behavior:

  • The Elasticsearch node remains stuck in the stopping state until either Fleet Server is stopped, or 2–3 minutes have elapsed.
  • This undermines redundancy in HA setups by requiring all Fleet Servers to be stopped for Elasticsearch node maintenance.

Additional Information:

  • Reproducible in both production and development environments.
  • Issue did not occur in some earlier versions, but the exact point of it started occurring is unknown.
  • No relevant info appears in Elasticsearch logs during shutdown attempts.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions