etcdctl: cluster-health reports has delay #2340

kelseyhightower · 2015-02-19T06:42:31Z

etcd version: v2.0.3

How to reproduce:

Start a 3 node etcd cluster.
Kill a node (hard poweroff the machine -- no clean shutdown)
run etcdctl cluster-health

Expected results:

etcd reports the kill node as unhealthy

Actual Results:

$ etcdctl cluster-health
cluster is healthy
member 5ae3067007f7fb85 is healthy
member 7931e79c0d8b47c5 is healthy 
member 987146e8925f10e5 is healthy <- This node is powered off

The text was updated successfully, but these errors were encountered:

kelseyhightower · 2015-02-19T06:44:26Z

Seems after 5 mins or so, the cluster-health command reports the node unhealthy:

$ etcdctl cluster-health
cluster is healthy
member 5ae3067007f7fb85 is healthy
member 7931e79c0d8b47c5 is healthy
member 987146e8925f10e5 is unhealthy

xiang90 · 2015-02-19T07:08:18Z

@kelseyhightower
The current detection is not very reliable since it is using the old 0.4.x endpoint. The member healthy checking is based on the underlying tcp connection. If you just power off the machine, the connection will not be dropped immediately.

This command achieves our current goal for detection the healthiness after migration.
We will improve it.

philips · 2015-03-30T17:42:21Z

What will the approach be on this? Are we going to add a threshold for RPC latency and then mark something as unhealthy at that point?

xiang90 · 2015-04-09T16:59:17Z

I am moving this to 2.2.

This is more in line with our goal for 2.2: etcdctl uses etcd/client and new endpoints if possible.

xiang90 · 2015-07-27T02:32:29Z

This is kind of duplicated with #2711 and @yichengq is working on it. So I am closing this one if favor of #2711.

xiang90 changed the title ~~etcdctl cluster-health reports unhealthy node as healthy~~ etcdctl: cluster-health reports has delay Feb 19, 2015

xiang90 added the enhancement label Feb 19, 2015

xiang90 added this to the v2.1.0 milestone Feb 19, 2015

xiang90 modified the milestones: v2.2.0, v2.1.0 Apr 9, 2015

xiang90 removed the enhancement label Apr 10, 2015

yichengq mentioned this issue May 13, 2015

etcdctl cluster-health and member list commands do not work correctly #2711

Closed

xiang90 self-assigned this Jul 1, 2015

xiang90 closed this as completed Jul 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcdctl: cluster-health reports has delay #2340

etcdctl: cluster-health reports has delay #2340

kelseyhightower commented Feb 19, 2015

kelseyhightower commented Feb 19, 2015

xiang90 commented Feb 19, 2015

philips commented Mar 30, 2015

xiang90 commented Apr 9, 2015

xiang90 commented Jul 27, 2015

etcdctl: cluster-health reports has delay #2340

etcdctl: cluster-health reports has delay #2340

Comments

kelseyhightower commented Feb 19, 2015

kelseyhightower commented Feb 19, 2015

xiang90 commented Feb 19, 2015

philips commented Mar 30, 2015

xiang90 commented Apr 9, 2015

xiang90 commented Jul 27, 2015