Skip to content

etcdctl: cluster-health reports has delay #2340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kelseyhightower opened this issue Feb 19, 2015 · 5 comments
Closed

etcdctl: cluster-health reports has delay #2340

kelseyhightower opened this issue Feb 19, 2015 · 5 comments
Assignees
Milestone

Comments

@kelseyhightower
Copy link
Contributor

etcd version: v2.0.3

How to reproduce:

  • Start a 3 node etcd cluster.
  • Kill a node (hard poweroff the machine -- no clean shutdown)
  • run etcdctl cluster-health

Expected results:

etcd reports the kill node as unhealthy

Actual Results:

$ etcdctl cluster-health
cluster is healthy
member 5ae3067007f7fb85 is healthy
member 7931e79c0d8b47c5 is healthy 
member 987146e8925f10e5 is healthy <- This node is powered off
@kelseyhightower
Copy link
Contributor Author

Seems after 5 mins or so, the cluster-health command reports the node unhealthy:

$ etcdctl cluster-health
cluster is healthy
member 5ae3067007f7fb85 is healthy
member 7931e79c0d8b47c5 is healthy
member 987146e8925f10e5 is unhealthy

@xiang90
Copy link
Contributor

xiang90 commented Feb 19, 2015

@kelseyhightower
The current detection is not very reliable since it is using the old 0.4.x endpoint. The member healthy checking is based on the underlying tcp connection. If you just power off the machine, the connection will not be dropped immediately.

This command achieves our current goal for detection the healthiness after migration.
We will improve it.

@xiang90 xiang90 changed the title etcdctl cluster-health reports unhealthy node as healthy etcdctl: cluster-health reports has delay Feb 19, 2015
@xiang90 xiang90 added this to the v2.1.0 milestone Feb 19, 2015
@philips
Copy link
Contributor

philips commented Mar 30, 2015

What will the approach be on this? Are we going to add a threshold for RPC latency and then mark something as unhealthy at that point?

@xiang90
Copy link
Contributor

xiang90 commented Apr 9, 2015

I am moving this to 2.2.

This is more in line with our goal for 2.2: etcdctl uses etcd/client and new endpoints if possible.

@xiang90
Copy link
Contributor

xiang90 commented Jul 27, 2015

This is kind of duplicated with #2711 and @yichengq is working on it. So I am closing this one if favor of #2711.

@xiang90 xiang90 closed this as completed Jul 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants