-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos planner component proposal #4458
Thanos planner component proposal #4458
Conversation
andrejbranch
commented
Jul 20, 2021
- I added CHANGELOG entry for this change.
- Change is not relevant to the end user.
Looks related to cortexproject/cortex#4272 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Thanks for this, sorry for late review - was on PTO
@thanos-io/thanos-maintainers PTAL 🤗
Some initial comments, otherwise it looks good. Let's indeed check Cortex plan - we are touching same part and looks like we want to solve same problem
|
||
## Goals | ||
|
||
* To separate planner into its own component and have compactors communicate with the planner to get the next available plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goals are behind this one. Would be nice outline our motivation here. I think it's prerquisite to scale horizontally compactor, especially within same tenant/external labels... no? 🤗 - which is essentially breaking from singleton model
CompactionPlan = 1 | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention some alternatives?
I can think of one: Coordination free gossip-based planning and compaction
## Proposal | ||
 | ||
|
||
### Separate planner into its own component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Separate planner into its own component. | |
### Step 1: Separate planner into its own component. |
### Separate planner into its own component. | ||
For the initial implementation a reasonable amount of the current planner code could be reused. One difference is the new planner will need to run plans for all tenants. The planner should maintain a priority queue of available plans with fair ordering between tenants. The planner should also be aware of which plans are currently running and be able to handle the failure of a compaction plan gracefully. After the completion of this proposal, planner can be updated to improve single tenant compaction performance, following the goals of the [compaction group concurrency pull request](https://github.com/thanos-io/thanos/pull/3807). | ||
|
||
### GRPC pubsub (bidirectional streaming) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### GRPC pubsub (bidirectional streaming) | |
### Step 2: GRPC pubsub (bidirectional streaming) |
|
||
* Horizontally scaling compaction for a single tenant will be addressed following the completion of this proposal | ||
|
||
## Proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some ### Risks
section where we can explain what we do to ensure Planner is not Single point of failure? (HA / Scalability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice proposal 👍 As a community, agreeing upon an approach to refactoring and improving the compactor is long overdue.
Personally, I don't think implementing this as a separate Thanos component is the right approach. Critically, we would break existing users when they upgraded. Preferably, we would follow the pattern set by receive
.
By default it will run as a single instance, but can also be configured in separate routing
and ingesting
modes for advanced users. Scaling up compactor would be an advanced feature, so we could reasonably expect the user to understand the differences and be able to configure it accordingly.
The default behaviour would be to run this work in a configurable number of local worker goroutines so vertically scaling would bring performance benefits, but the advanced feature would be to offload to remote workers so they could horizontally scale workloads with an HPA.
With this in mind, I think there are two sub-problems we should solve in order:
- Make compactor scale vertically on one node by paralellizing compactions.
- Make compactor offload compaction tasks to workers.
|
||
## Goals | ||
|
||
* To separate planner into its own component and have compactors communicate with the planner to get the next available plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is not a goal of the proposal per-se but how you propose satisfying the goal - reading this it sounds like the goal is to make compactor run faster? Or perhaps to increase block compaction throughput?
 | ||
|
||
### Separate planner into its own component. | ||
For the initial implementation a reasonable amount of the current planner code could be reused. One difference is the new planner will need to run plans for all tenants. The planner should maintain a priority queue of available plans with fair ordering between tenants. The planner should also be aware of which plans are currently running and be able to handle the failure of a compaction plan gracefully. After the completion of this proposal, planner can be updated to improve single tenant compaction performance, following the goals of the [compaction group concurrency pull request](https://github.com/thanos-io/thanos/pull/3807). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the planner
prioritise plans? What would it be optimising for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do that on later stages - there are many ideas we want to do (:
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
It would be nice to revisit this one someday.
|
Is it possible for me to continue this work? I would like the planner part to be compatible with Cortex, so that both systems can enjoy the benefit it brings. While we flesh out whether a dedicated planner component is required, I believe both projects can already enjoy some improvement if the compactor is able to spit out parallelizable plans. |
Go for it @roystchiang ! Looks like it went stale. Feel free to take @andrejbranch work and open your own PR, we can close this one 🤗 Thanks! We definitely need help on this! |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |