-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix queriers shuffle-sharding blast radius containment #3901
Fix queriers shuffle-sharding blast radius containment #3901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, looking forward to see the tests. My comments are mostly some grammar suggestions... if you find these annoying, let me know and I will stop. (Some may also be wrong, so... :))
pkg/scheduler/queue/user_queues.go
Outdated
querierConnections map[string]int | ||
// How long to wait before removing a querier which has got disconnected | ||
// but hasn't notified a graceful shutdown. | ||
forgetTimeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd suggest to move forgetTimeout
to user_queues.go
, and pass only threshold to forgetDisconnectQueriers
. removeQuerierConnection
is also using this field, but only to know whether to keep disconnected querier or not. We could pass that information as a bool to removeQuerierConnection
. This would simplify queues
a bit, and allow for easier testing of forgetDisconnectQueriers
. I would also suggest passing time.Now()
as parameter to removeQuerierConnection
, to avoid having dependency on current time in queues
. WDYT?
2ccc0d8
to
03cb76f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me functionally. I had a question about alternative naming of the configuration value being added, but otherwise LGTM.
2732880
to
4ec324d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this and the changes.
CHANGELOG.md
Outdated
@@ -91,6 +91,7 @@ | |||
* `cortex_bucket_store_chunk_pool_returned_bytes_total` | |||
* [ENHANCEMENT] Alertmanager: load alertmanager configurations from object storage concurrently, and only load necessary configurations, speeding configuration synchronization process and executing fewer "GET object" operations to the storage when sharding is enabled. #3898 | |||
* [ENHANCEMENT] Blocks storage: Ingester can now stream entire chunks instead of individual samples to the querier. At the moment this feature must be explicitly enabled either by using `-ingester.stream-chunks-when-using-blocks` flag or `ingester_stream_chunks_when_using_blocks` (boolean) field in runtime config file, but these configuration options are temporary and will be removed when feature is stable. #3889 | |||
* [ENHANCEMENT] Query-frontend/scheduler: added querier forget delay (`-query-frontend.querier-forget-delay` and `-query-scheduler.querier-forget-delay`) to mitigate the blast radius in the event queriers crash because of a repeatedly sent "query of death" when shuffle-sharding is enabled. #3901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cortex release 1.8.0 is now in progress. Could you please rebase master and move the CHANGELOG entry under the master / unreleased section?
68d9eca
to
8b915a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work.
querierConnections map[string]int | ||
// How long to wait before removing a querier which has got disconnected | ||
// but hasn't notified about a graceful shutdown. | ||
forgetDelay time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I would prefer to move forgetDelay
out of queues
struct, and pass forgetTime
(when to forget given querier, or zero time if immediately) to removeQuerierConnection
and only threshold
to forgetDisconnectedQueriers
. It's a small change, but helps to minimize change to the queues
struct, which is already tricky.
04220de
to
1a2582d
Compare
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
…ting a querier Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
1a2582d
to
1ca9855
Compare
Signed-off-by: Marco Pracucci <[email protected]>
…#3901) * Added forget timeout support to queues Signed-off-by: Marco Pracucci <[email protected]> * Added notify shutdown rpc to query-frontend and query-scheduler proto Signed-off-by: Marco Pracucci <[email protected]> * Querier worker notifies shutdown to query-frontend/scheduler Signed-off-by: Marco Pracucci <[email protected]> * Log when query-frontend/scheduler receives a shutdown notification Signed-off-by: Marco Pracucci <[email protected]> * Added config option to configure the forget timeout Signed-off-by: Marco Pracucci <[email protected]> * Fixed re-connect while in forget waiting period Signed-off-by: Marco Pracucci <[email protected]> * Fixed unit tests Signed-off-by: Marco Pracucci <[email protected]> * Fixed GetNextRequestForQuerier() when a resharding happen after fogetting a querier Signed-off-by: Marco Pracucci <[email protected]> * Update pkg/frontend/v1/frontend.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/scheduler/queue/user_queues.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/scheduler/queue/user_queues.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/scheduler/scheduler.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/querier/worker/frontend_processor.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Updated comment based on review feedback Signed-off-by: Marco Pracucci <[email protected]> * Updated comment based on review feedback Signed-off-by: Marco Pracucci <[email protected]> * Updated generated doc Signed-off-by: Marco Pracucci <[email protected]> * Added name to services Signed-off-by: Marco Pracucci <[email protected]> * Moved forgetCheckPeriod where it's used Signed-off-by: Marco Pracucci <[email protected]> * Added queues forget timeout unit tests Signed-off-by: Marco Pracucci <[email protected]> * Added RequestQueue unit test Signed-off-by: Marco Pracucci <[email protected]> * Renamed querier forget timeout into delay Signed-off-by: Marco Pracucci <[email protected]> * Added timeout to the notify shutdown notification Signed-off-by: Marco Pracucci <[email protected]> * Updated doc Signed-off-by: Marco Pracucci <[email protected]> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <[email protected]> * Update pkg/scheduler/scheduler.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/frontend/v1/frontend.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Updated doc Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
What this PR does:
As described in the #3571, when shuffle-sharding is enabled in the query-frontend/scheduler and a querier crashes, tenants are immediately resharded across the remaining queriers. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a "poisoned query" on the read path: if a tenant repeatedly send a poisoned query over and over it has the ability to crash all queriers, and not just its shard.
In this PR I propose a solution to mitigate it, introducing a delay ("forget delay") between when a querier disconnects because of a crash and when a tenant's shard changes because of that.
To do it, a query-frontend/scheduler needs to know when a querier disconnects because of crash. I've introduced a "graceful shutdown notification" from the querier to query-frontend/scheduler: when a querier disconnects from the query-frontend/scheduler without sending such notification it means the querier crashed / abruptly terminated.
I've done some manual testing and looks working as expected.
Which issue(s) this PR fixes:
Fixes #3571
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]