Description
Version:
8.18.0
Operating System:
Ubuntu 22.04.5
Discuss Forum URL:
https://discuss.elastic.co/t/376934
Summary of the Issue:
Fleet Server maintains persistent HTTP connections to its configured Elasticsearch output nodes, which prevents those nodes from shutting down cleanly. When an Elasticsearch node is stopped, it hangs for 2–3 minutes, waiting for Fleet Server to release its connections. Restarting the Fleet Server (i.e., its elastic-agent
process) allows the node to shut down immediately, confirming Fleet Server as the blocking party.
This behaviour can also be confirmed by monitoring active connections to port 9200 on the coordinating node during shutdown. Using netstat
or ss
shows that Fleet Server maintains open connections even as the Elasticsearch node attempts to stop.
This breaks high-availability expectations: since even a single running Fleet Server will prevent node shutdown, all Fleet Servers must be stopped before any of their Elasticsearch output nodes can shut down cleanly.
Note that other stack components (e.g., Logstash) cleanly disconnect on shutdown, suggesting Fleet Server does not correctly respond to connection termination signals from Elasticsearch.
Steps to Reproduce:
- Deploy Fleet Server (a single instance is sufficient).
- Ensure it is connected to Elasticsearch via HTTPS.
- Attempt to shut down any of its Elasticsearch output nodes.
- Observe the node hang in the
stopping
state for 2–3 minutes. - During shutdown, run
ss
ornetstat
and observe persistent connections to port 9200 from the Fleet Server. - Restart the Fleet Server while the ES node is still stopping.
- Observe the node shut down immediately after Fleet Server exits.
Expected Behavior:
- Fleet Server should detect node shutdown and release its connections immediately, allowing Elasticsearch to stop cleanly.
Actual Behavior:
- The Elasticsearch node remains stuck in the
stopping
state until either Fleet Server is stopped, or 2–3 minutes have elapsed. - This undermines redundancy in HA setups by requiring all Fleet Servers to be stopped for Elasticsearch node maintenance.
Additional Information:
- Reproducible in both production and development environments.
- Issue did not occur in some earlier versions, but the exact point of it started occurring is unknown.
- No relevant info appears in Elasticsearch logs during shutdown attempts.