Skip to content

Allow ingesters to stay in the ring during restart #3305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 23, 2020

Conversation

csmarchbanks
Copy link
Contributor

@csmarchbanks csmarchbanks commented Oct 9, 2020

By keeping an ingester in the ring and not writing to additional
ingesters while it is being restarted, will stop small chunks from being
persisted to other ingesters. This will allow users running with a WAL
to avoid excess load and memory during a rolling restart.

I am curious what others think of this, if accepted, I think a similar
approach could be applied to the store-gateway (issue #2823). The
downside is that scaling a cortex cluster down while running with skip
unregister will involve manual forgetting of the ingester (or automation
to do this).

Fixes: #3304

Checklist

  • Tests updated (some tests use old way, some use new way).
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 9, 2020
@csmarchbanks csmarchbanks changed the title WIP: Allow ingesters to stay in the ring during restart Allow ingesters to stay in the ring during restart Oct 14, 2020
@csmarchbanks csmarchbanks marked this pull request as ready for review October 14, 2020 15:39
Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I've left some comments/suggestions. Can we also modify api/_index.md to document that /shutdown will unregister from the ring, even when unregistering is disabled?

@pull-request-size pull-request-size bot added size/L and removed size/M labels Nov 12, 2020
@csmarchbanks csmarchbanks force-pushed the no-extend-writes branch 4 times, most recently from e2b1a6a to 030d189 Compare November 12, 2020 22:38
@csmarchbanks
Copy link
Contributor Author

Thanks for the review! I implemented all of the suggestions - the rename did mean changing some tests around that were depending on SkipUnregister being default false.

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Let's wait for @pracucci's review as well, before merging, please.

By keeping an ingester in the ring and not writing to additional
ingesters while it is being restarted, keeps small series
from being created. This will allow users running with block storage or
a WAL to avoid excess load and memory during a rolling restart.

Signed-off-by: Chris Marchbanks <[email protected]>
@csmarchbanks
Copy link
Contributor Author

Friendly reminder for @pracucci to review as requested :)

Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo some minor nits/suggestions

@@ -778,6 +781,16 @@ func (i *Lifecycler) SetFlushOnShutdown(flushOnShutdown bool) {
i.flushOnShutdown.Store(flushOnShutdown)
}

// SkipUnregister returns if unregistering is skipped on shutdown.
func (i *Lifecycler) UnregisterFromRing() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion(non-blocking,if-minor): I would rename this function to parallel SetUnregisterFromRing. In my opinion UnregisterFromRing() connotes that it will actually perform the operation to unregister from the ring, not return the config value.

Suggested change
func (i *Lifecycler) UnregisterFromRing() bool {
func (i *Lifecycler) GetUnregisterFromRing() bool {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to Should which read nicer to me than Get.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pracucci pracucci merged commit 3f24943 into cortexproject:master Nov 23, 2020
@bboreham
Copy link
Contributor

bboreham commented Jan 4, 2021

Ref #467 which I think this fixes, at least for the WAL restart-in-place case.

@bboreham
Copy link
Contributor

bboreham commented Jan 4, 2021

I'll mention another downside: you lose redundancy.

With replication-factor=3, if you turn off "extend writes" then do a rolling update, you are down to 2 copies of some data. If one of those is then lost then queries will be unstable due to #731.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce ingester churn during rolling restart
5 participants