-
Notifications
You must be signed in to change notification settings - Fork 815
Ingesters failing to leave the ring in GKE #4467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Those messages come from the etcd library:
Can you post the logs leading up to that point, so we can figure out how it gets into that state? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Im seeing an issue with ingesters sometimes failing to leave the ring. This seems to happen no matter which kv store is used. It looks as though there is a race condition with closing the lifecycler loop and leaving the ring. Below is example logs of using etcd as kv store.
To Reproduce
Steps to reproduce the behavior:
I've been able to reproduce by starting from a completely blank deployment, spin up some ingesters and connect them to the ring. Do a rolling restart on them and all looks good. Every ingester leaves the ring and rejoins properly. After the rolling restart is done, do another rolling restart and some ingesters fail to leave the ring. It doesnt matter if I use memberlist, or etcd, or consul.
Expected behavior
Ingesters should leave the ring no matter how many times they are restarted when unregister on shutdown is true
Environment:
Storage Engine
Additional Context
I found this bug when testing a lower replication factor. I'm wondering if this is missed by most deployments because the replication factor of 3 with extending writes hides the issue. With a lower replication factor if an ingester fails to leave the ring all writes fail.
The text was updated successfully, but these errors were encountered: