Fix queriers shuffle-sharding blast radius containment #3901

pracucci · 2021-03-03T13:49:42Z

What this PR does:
As described in the #3571, when shuffle-sharding is enabled in the query-frontend/scheduler and a querier crashes, tenants are immediately resharded across the remaining queriers. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a "poisoned query" on the read path: if a tenant repeatedly send a poisoned query over and over it has the ability to crash all queriers, and not just its shard.

In this PR I propose a solution to mitigate it, introducing a delay ("forget delay") between when a querier disconnects because of a crash and when a tenant's shard changes because of that.

To do it, a query-frontend/scheduler needs to know when a querier disconnects because of crash. I've introduced a "graceful shutdown notification" from the querier to query-frontend/scheduler: when a querier disconnects from the query-frontend/scheduler without sending such notification it means the querier crashed / abruptly terminated.

I've done some manual testing and looks working as expected.

Which issue(s) this PR fixes:
Fixes #3571

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pstibrany

Good job, looking forward to see the tests. My comments are mostly some grammar suggestions... if you find these annoying, let me know and I will stop. (Some may also be wrong, so... :))

pkg/frontend/v1/frontend.go

pkg/scheduler/queue/user_queues.go

pstibrany · 2021-03-03T16:12:07Z

pkg/scheduler/queue/user_queues.go

-	querierConnections map[string]int
+	// How long to wait before removing a querier which has got disconnected
+	// but hasn't notified a graceful shutdown.
+	forgetTimeout time.Duration


Nit: I'd suggest to move forgetTimeout to user_queues.go, and pass only threshold to forgetDisconnectQueriers. removeQuerierConnection is also using this field, but only to know whether to keep disconnected querier or not. We could pass that information as a bool to removeQuerierConnection. This would simplify queues a bit, and allow for easier testing of forgetDisconnectQueriers. I would also suggest passing time.Now() as parameter to removeQuerierConnection, to avoid having dependency on current time in queues. WDYT?

pkg/scheduler/scheduler.go

pkg/querier/worker/frontend_processor.go

ranton256

This makes sense to me functionally. I had a question about alternative naming of the configuration value being added, but otherwise LGTM.

docs/configuration/config-file-reference.md

ranton256

Thanks for doing this and the changes.

pstibrany · 2021-03-08T07:18:33Z

CHANGELOG.md

@@ -91,6 +91,7 @@
  * `cortex_bucket_store_chunk_pool_returned_bytes_total`
 * [ENHANCEMENT] Alertmanager: load alertmanager configurations from object storage concurrently, and only load necessary configurations, speeding configuration synchronization process and executing fewer "GET object" operations to the storage when sharding is enabled. #3898
 * [ENHANCEMENT] Blocks storage: Ingester can now stream entire chunks instead of individual samples to the querier. At the moment this feature must be explicitly enabled either by using `-ingester.stream-chunks-when-using-blocks` flag or `ingester_stream_chunks_when_using_blocks` (boolean) field in runtime config file, but these configuration options are temporary and will be removed when feature is stable. #3889
+* [ENHANCEMENT] Query-frontend/scheduler: added querier forget delay (`-query-frontend.querier-forget-delay` and `-query-scheduler.querier-forget-delay`) to mitigate the blast radius in the event queriers crash because of a repeatedly sent "query of death" when shuffle-sharding is enabled. #3901


Cortex release 1.8.0 is now in progress. Could you please rebase master and move the CHANGELOG entry under the master / unreleased section?

pstibrany

LGTM, great work.

pkg/frontend/v1/frontend.go

pstibrany · 2021-03-08T15:31:57Z

pkg/scheduler/queue/user_queues.go

-	querierConnections map[string]int
+	// How long to wait before removing a querier which has got disconnected
+	// but hasn't notified about a graceful shutdown.
+	forgetDelay time.Duration


suggestion: I would prefer to move forgetDelay out of queues struct, and pass forgetTime (when to forget given querier, or zero time if immediately) to removeQuerierConnection and only threshold to forgetDisconnectedQueriers. It's a small change, but helps to minimize change to the queues struct, which is already tricky.

pkg/scheduler/scheduler.go

Signed-off-by: Marco Pracucci <[email protected]>

…ting a querier Signed-off-by: Marco Pracucci <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]>

…#3901) * Added forget timeout support to queues Signed-off-by: Marco Pracucci <[email protected]> * Added notify shutdown rpc to query-frontend and query-scheduler proto Signed-off-by: Marco Pracucci <[email protected]> * Querier worker notifies shutdown to query-frontend/scheduler Signed-off-by: Marco Pracucci <[email protected]> * Log when query-frontend/scheduler receives a shutdown notification Signed-off-by: Marco Pracucci <[email protected]> * Added config option to configure the forget timeout Signed-off-by: Marco Pracucci <[email protected]> * Fixed re-connect while in forget waiting period Signed-off-by: Marco Pracucci <[email protected]> * Fixed unit tests Signed-off-by: Marco Pracucci <[email protected]> * Fixed GetNextRequestForQuerier() when a resharding happen after fogetting a querier Signed-off-by: Marco Pracucci <[email protected]> * Update pkg/frontend/v1/frontend.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/scheduler/queue/user_queues.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/scheduler/queue/user_queues.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/scheduler/scheduler.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/querier/worker/frontend_processor.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Updated comment based on review feedback Signed-off-by: Marco Pracucci <[email protected]> * Updated comment based on review feedback Signed-off-by: Marco Pracucci <[email protected]> * Updated generated doc Signed-off-by: Marco Pracucci <[email protected]> * Added name to services Signed-off-by: Marco Pracucci <[email protected]> * Moved forgetCheckPeriod where it's used Signed-off-by: Marco Pracucci <[email protected]> * Added queues forget timeout unit tests Signed-off-by: Marco Pracucci <[email protected]> * Added RequestQueue unit test Signed-off-by: Marco Pracucci <[email protected]> * Renamed querier forget timeout into delay Signed-off-by: Marco Pracucci <[email protected]> * Added timeout to the notify shutdown notification Signed-off-by: Marco Pracucci <[email protected]> * Updated doc Signed-off-by: Marco Pracucci <[email protected]> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <[email protected]> * Update pkg/scheduler/scheduler.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update pkg/frontend/v1/frontend.go Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Updated doc Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

pracucci requested a review from pstibrany March 3, 2021 13:49

pull-request-size bot added the size/L label Mar 3, 2021

pracucci changed the title ~~Fix querier shuffle sharding~~ Fix querier shuffle-sharding blast radius containment Mar 3, 2021

pracucci changed the title ~~Fix querier shuffle-sharding blast radius containment~~ Fix queriers shuffle-sharding blast radius containment Mar 3, 2021

pracucci marked this pull request as draft March 3, 2021 13:50

pstibrany reviewed Mar 3, 2021

View reviewed changes

pracucci force-pushed the fix-querier-shuffle-sharding branch from 2ccc0d8 to 03cb76f Compare March 4, 2021 10:48

pull-request-size bot added size/XL and removed size/L labels Mar 4, 2021

ranton256 reviewed Mar 4, 2021

View reviewed changes

docs/configuration/config-file-reference.md Outdated Show resolved Hide resolved

pracucci force-pushed the fix-querier-shuffle-sharding branch from 2732880 to 4ec324d Compare March 5, 2021 09:21

pracucci marked this pull request as ready for review March 5, 2021 09:49

pracucci requested a review from pstibrany March 5, 2021 09:50

ranton256 approved these changes Mar 6, 2021

View reviewed changes

pstibrany reviewed Mar 8, 2021

View reviewed changes

pracucci force-pushed the fix-querier-shuffle-sharding branch from 68d9eca to 8b915a5 Compare March 8, 2021 08:31

pstibrany approved these changes Mar 8, 2021

View reviewed changes

pracucci force-pushed the fix-querier-shuffle-sharding branch from 04220de to 1a2582d Compare March 8, 2021 16:11

pracucci and others added 12 commits March 9, 2021 08:30

Added forget timeout support to queues

85f8a15

Signed-off-by: Marco Pracucci <[email protected]>

Added notify shutdown rpc to query-frontend and query-scheduler proto

97a06ef

Signed-off-by: Marco Pracucci <[email protected]>

Querier worker notifies shutdown to query-frontend/scheduler

69766c5

Signed-off-by: Marco Pracucci <[email protected]>

Log when query-frontend/scheduler receives a shutdown notification

a7a9b31

Signed-off-by: Marco Pracucci <[email protected]>

Added config option to configure the forget timeout

c3833e8

Signed-off-by: Marco Pracucci <[email protected]>

Fixed re-connect while in forget waiting period

ad200c7

Signed-off-by: Marco Pracucci <[email protected]>

Fixed unit tests

ad49aaf

Signed-off-by: Marco Pracucci <[email protected]>

Fixed GetNextRequestForQuerier() when a resharding happen after foget…

ed4afa9

…ting a querier Signed-off-by: Marco Pracucci <[email protected]>

Update pkg/frontend/v1/frontend.go

c4d9312

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Update pkg/scheduler/queue/user_queues.go

5888f62

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Update pkg/scheduler/queue/user_queues.go

874b482

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Update pkg/scheduler/scheduler.go

8cdd60a

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

pracucci and others added 14 commits March 9, 2021 08:30

Update pkg/querier/worker/frontend_processor.go

ee4130e

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Updated comment based on review feedback

182c979

Signed-off-by: Marco Pracucci <[email protected]>

Updated comment based on review feedback

dd358ed

Signed-off-by: Marco Pracucci <[email protected]>

Updated generated doc

2505912

Signed-off-by: Marco Pracucci <[email protected]>

Added name to services

b39bd26

Signed-off-by: Marco Pracucci <[email protected]>

Moved forgetCheckPeriod where it's used

326435a

Signed-off-by: Marco Pracucci <[email protected]>

Added queues forget timeout unit tests

47d1ce2

Signed-off-by: Marco Pracucci <[email protected]>

Added RequestQueue unit test

5b8b6a7

Signed-off-by: Marco Pracucci <[email protected]>

Renamed querier forget timeout into delay

cb30035

Signed-off-by: Marco Pracucci <[email protected]>

Added timeout to the notify shutdown notification

8cf703e

Signed-off-by: Marco Pracucci <[email protected]>

Updated doc

ab55057

Signed-off-by: Marco Pracucci <[email protected]>

Added CHANGELOG entry

42979eb

Signed-off-by: Marco Pracucci <[email protected]>

Update pkg/scheduler/scheduler.go

84160d8

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

Update pkg/frontend/v1/frontend.go

1ca9855

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>

pracucci force-pushed the fix-querier-shuffle-sharding branch from 1a2582d to 1ca9855 Compare March 9, 2021 07:30

Updated doc

d56be11

Signed-off-by: Marco Pracucci <[email protected]>

pracucci merged commit 1ddb423 into cortexproject:master Mar 9, 2021

pracucci deleted the fix-querier-shuffle-sharding branch March 9, 2021 08:43

qinxx108 mentioned this pull request Apr 9, 2021

querier can not de-register itself from query-frontend when normal shutdown #4064

Closed

2 tasks

bboreham mentioned this pull request Aug 24, 2021

When a querier is shutdown, all in-flight requests are cancelled grafana/tempo#826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix queriers shuffle-sharding blast radius containment #3901

Fix queriers shuffle-sharding blast radius containment #3901

pracucci commented Mar 3, 2021 •

edited

Loading

pstibrany left a comment

pstibrany Mar 3, 2021

ranton256 left a comment

ranton256 left a comment

pstibrany Mar 8, 2021

pstibrany left a comment

pstibrany Mar 8, 2021

Fix queriers shuffle-sharding blast radius containment #3901

Fix queriers shuffle-sharding blast radius containment #3901

Conversation

pracucci commented Mar 3, 2021 • edited Loading

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany Mar 3, 2021

Choose a reason for hiding this comment

ranton256 left a comment

Choose a reason for hiding this comment

ranton256 left a comment

Choose a reason for hiding this comment

pstibrany Mar 8, 2021

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany Mar 8, 2021

Choose a reason for hiding this comment

pracucci commented Mar 3, 2021 •

edited

Loading