-
Notifications
You must be signed in to change notification settings - Fork 815
Allow ingesters to stay in the ring during restart #3305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0fc2e68
to
fbabbe7
Compare
fbabbe7
to
dfb5be5
Compare
005d38e
to
726f518
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I've left some comments/suggestions. Can we also modify api/_index.md to document that /shutdown will unregister from the ring, even when unregistering is disabled?
e2b1a6a
to
030d189
Compare
Thanks for the review! I implemented all of the suggestions - the rename did mean changing some tests around that were depending on SkipUnregister being default false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. Let's wait for @pracucci's review as well, before merging, please.
By keeping an ingester in the ring and not writing to additional ingesters while it is being restarted, keeps small series from being created. This will allow users running with block storage or a WAL to avoid excess load and memory during a rolling restart. Signed-off-by: Chris Marchbanks <[email protected]>
Signed-off-by: Chris Marchbanks <[email protected]>
030d189
to
31edf47
Compare
Friendly reminder for @pracucci to review as requested :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo some minor nits/suggestions
pkg/ring/lifecycler.go
Outdated
@@ -778,6 +781,16 @@ func (i *Lifecycler) SetFlushOnShutdown(flushOnShutdown bool) { | |||
i.flushOnShutdown.Store(flushOnShutdown) | |||
} | |||
|
|||
// SkipUnregister returns if unregistering is skipped on shutdown. | |||
func (i *Lifecycler) UnregisterFromRing() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion(non-blocking,if-minor): I would rename this function to parallel SetUnregisterFromRing
. In my opinion UnregisterFromRing()
connotes that it will actually perform the operation to unregister from the ring, not return the config value.
func (i *Lifecycler) UnregisterFromRing() bool { | |
func (i *Lifecycler) GetUnregisterFromRing() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to Should
which read nicer to me than Get
.
Signed-off-by: Chris Marchbanks <[email protected]>
31edf47
to
1823579
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Ref #467 which I think this fixes, at least for the WAL restart-in-place case. |
I'll mention another downside: you lose redundancy. With replication-factor=3, if you turn off "extend writes" then do a rolling update, you are down to 2 copies of some data. If one of those is then lost then queries will be unstable due to #731. |
By keeping an ingester in the ring and not writing to additional
ingesters while it is being restarted, will stop small chunks from being
persisted to other ingesters. This will allow users running with a WAL
to avoid excess load and memory during a rolling restart.
I am curious what others think of this, if accepted, I think a similar
approach could be applied to the store-gateway (issue #2823). The
downside is that scaling a cortex cluster down while running with skip
unregister will involve manual forgetting of the ingester (or automation
to do this).
Fixes: #3304
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]