Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split cleaner cycle for active and deleted tenants #6112

Merged
merged 4 commits into from
Jul 23, 2024

Conversation

alexqyle
Copy link
Contributor

@alexqyle alexqyle commented Jul 23, 2024

What this PR does:

Split cleaner cycle into two separated cycles. One is for active tenants to be cleaned for deleted blocks and update bucket index. Another cycle is for deleted tenants to be cleaned for any data upon deletion. The reason separating those two is that sometime cleaning data for deleted tenant could take a long time to fully delete all blocks if the tenant is large enough. This would block the rest of tenants to be cleaned or deleted which would result in delay bucket index update for active tenants. Separating could prevent this kind of blockage to happen

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Alex Le <[email protected]>
level.Error(c.logger).Log("msg", "failed to scan users on startup", "err", err.Error())
c.runsFailed.WithLabelValues(deletedStatus).Inc()
c.runsFailed.WithLabelValues(activeStatus).Inc()
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we don't return error is for cleaner to retry at loop?
This looks weird. I don't know if other Cortex components do the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not intended to return nil. Let me revisit the code and update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on current logic (https://github.com/cortexproject/cortex/blob/master/pkg/compactor/blocks_cleaner.go#L144), cleaner never returns error. If it failed, it would just wait for next tick and run it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior sounds weird tbh. But let's leave it as it is. I guess if there is any error connecting to object storage like credential issues, it expects compactor to fail from compaction, not from cleaner.

alexqyle added 2 commits July 23, 2024 15:32
Signed-off-by: Alex Le <[email protected]>
Signed-off-by: Alex Le <[email protected]>
@danielblando danielblando merged commit 01f4d9d into cortexproject:master Jul 23, 2024
16 checks passed
@alexqyle alexqyle deleted the split-cleaner-loop branch July 24, 2024 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants