-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split cleaner cycle for active and deleted tenants #6112
Split cleaner cycle for active and deleted tenants #6112
Conversation
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
level.Error(c.logger).Log("msg", "failed to scan users on startup", "err", err.Error()) | ||
c.runsFailed.WithLabelValues(deletedStatus).Inc() | ||
c.runsFailed.WithLabelValues(activeStatus).Inc() | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why we don't return error is for cleaner to retry at loop
?
This looks weird. I don't know if other Cortex components do the same thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not intended to return nil
. Let me revisit the code and update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on current logic (https://github.com/cortexproject/cortex/blob/master/pkg/compactor/blocks_cleaner.go#L144), cleaner never returns error. If it failed, it would just wait for next tick and run it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior sounds weird tbh. But let's leave it as it is. I guess if there is any error connecting to object storage like credential issues, it expects compactor to fail from compaction, not from cleaner.
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
What this PR does:
Split cleaner cycle into two separated cycles. One is for active tenants to be cleaned for deleted blocks and update bucket index. Another cycle is for deleted tenants to be cleaned for any data upon deletion. The reason separating those two is that sometime cleaning data for deleted tenant could take a long time to fully delete all blocks if the tenant is large enough. This would block the rest of tenants to be cleaned or deleted which would result in delay bucket index update for active tenants. Separating could prevent this kind of blockage to happen
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]