Add "wait ring stability" to store-gateway and fix cold start issue #4271

pracucci · 2021-06-09T16:53:43Z

What this PR does:
The store-gateway suffers a cold start issue which this PR should fix.

Let's say we have a disruption across all store-gateways and we restart all of them. At startup, each store-gateway is in JOINING state in the ring and begin synching blocks. Due to how the BlocksSync ring operation works, if all store-gateways are in JOINING state (and they are during a cold start), each store-gateway replica syncs all the blocks, not just the blocks belonging to their own shard because we extend the replication set for instances in JOINING state.

This looks an apparently easy problem to solve, because we could just not extend the replication set for JOINING state but then this open to another issue. Let's say we have a pool of running store-gateways and we scale up. If the JOINING state doesn't extend the replication set, we can end up in a situation where the previous store-gateways holding a block all offload the block while new replicas are still loading it (JOINING state) so no store-gateway has that block and queries fail.

This PR tries to solve this problem changing the logic as follow:

Load a block when it belongs to store-gateway shard
Unload a block when it doesn't belong to store-gateway shard anymore but at least 1 owner is ACTIVE in the ring (otherwise wait before offloading)
If checking the ring fails, the store-gateway keeps the previously loaded blocks instead of offloading them

This logic change, in conjunction with the "wait ring stability" at startup, should solve the cold start issue. I've added unit tests to show it (they fail in master) and I've also manually tested in a dev cluster (my tests show working as expected).

Which issue(s) this PR fixes:
Fixes #2827 #3570

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pkg/storegateway/sharding_strategy.go

pracucci · 2021-06-09T17:02:50Z

pkg/storegateway/sharding_strategy.go

 	}
 }

 // Filter implements block.MetadataFilter.
+// This function is NOT safe for use by multiple goroutines concurrently.


It's not used concurrently. Not even Thanos filters are thread-safe. I've added the comment just to make it clear.

jtlisi

LGTM, modulo a changelog entry

jtlisi · 2021-06-09T18:42:02Z

pkg/storegateway/bucket_store_inmemory_server.go

@@ -30,6 +33,13 @@ func (s *bucketStoreSeriesServer) Send(r *storepb.SeriesResponse) error {
 		s.Warnings = append(s.Warnings, errors.New(r.GetWarning()))
 	}

+	if rawHints := r.GetHints(); rawHints != nil {


Is this change related?

Yes. It's just used by tests. I've added the ability to read hints too.

pkg/storegateway/sharding_strategy.go

bboreham · 2021-06-10T15:26:20Z

Does this help #3570 ?

pracucci · 2021-06-10T15:52:01Z

Does this help #3570 ?

Yes, I think so, because it doesn't expand the replication set anymore for instances in JOINING state. However the replication set would be expanded anyway for unhealthy store-gateways, but if they restart quick enough (before they're detected as unhealthy) then no extra pressure should be added to other (healthy) store-gateways.

Signed-off-by: Marco Pracucci <[email protected]>

…oreGateway_InitialSyncWithWaitRingStability Signed-off-by: Marco Pracucci <[email protected]>

…itialised and logged on a per test case basis, cause tests ordering is unstable Signed-off-by: Marco Pracucci <[email protected]>

…gedAfterScaleUp Signed-off-by: Marco Pracucci <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Jacob Lisi <[email protected]>

Signed-off-by: Marco Pracucci <[email protected]>

pstibrany

Nice work. Change makes sense to me, and PR looks great!

stevesg

Not an expert on the logic in this area but the code all looks good to me.

bboreham · 2021-06-14T09:18:45Z

Is the "Improved filterBlocksByRingSharding() logic" commit covered by the PR description? If not please add it.
Consider moving to a separate PR, because I can't see the relationship, and once all the commits are squashed it will be harder to look.

pracucci · 2021-06-14T12:55:13Z

Is the "Improved filterBlocksByRingSharding() logic" commit covered by the PR description?

Good point. I've added this one to the PR description:

If checking the ring fails, the store-gateway keeps the previously loaded blocks instead of offloading them

amckinley · 2021-06-16T01:35:15Z

docs/blocks-storage/store-gateway.md

@@ -81,6 +81,14 @@ The store-gateway replication optionally supports [zone-awareness](../guides/zon
 2. Enable blocks zone-aware replication via the `-store-gateway.sharding-ring.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware this configuration option should be set to store-gateways, queriers and rulers.
 3. Rollout store-gateways, queriers and rulers to apply the new configuration

+### Waiting for stable ring at startup
+
+In the event of a cluster cold start or scale up of 2+ store-gateway instances at the same time we may end up in a situation where each new store-gateway instance starts at a slightly different time and thus each one runs the initial blocks sync based on a different state of the ring. For example, in case of a cold start, the first store-gateway joining the ring may load all blocks since the sharding logic runs based on the current state of the ring, which is 1 single store-gateway.


Technically, shouldn't this be "greater than or equal to replication_factor store-gateway instances at the same time"?

It depends. For a cold start, yes (because if you have a number of replicas <= RF then all replicas load all blocks). For the scale up case you may have a RF=3 and scale up by +2 and this PR still improve it cause the 2 new replicas will not load extra blocks they will not need anymore once they will be both ACTIVE in the ring (after the initial sync is completed).

amckinley · 2021-06-16T01:35:56Z

docs/blocks-storage/store-gateway.template

@@ -81,6 +81,14 @@ The store-gateway replication optionally supports [zone-awareness](../guides/zon
 2. Enable blocks zone-aware replication via the `-store-gateway.sharding-ring.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware this configuration option should be set to store-gateways, queriers and rulers.
 3. Rollout store-gateways, queriers and rulers to apply the new configuration

+### Waiting for stable ring at startup
+
+In the event of a cluster cold start or scale up of 2+ store-gateway instances at the same time we may end up in a situation where each new store-gateway instance starts at a slightly different time and thus each one runs the initial blocks sync based on a different state of the ring. For example, in case of a cold start, the first store-gateway joining the ring may load all blocks since the sharding logic runs based on the current state of the ring, which is 1 single store-gateway.


Same as above.

pull-request-size bot added the size/XL label Jun 9, 2021

pracucci marked this pull request as draft June 9, 2021 16:56

pracucci commented Jun 9, 2021

View reviewed changes

pkg/storegateway/sharding_strategy.go Outdated Show resolved Hide resolved

pracucci commented Jun 9, 2021

View reviewed changes

jtlisi approved these changes Jun 9, 2021

View reviewed changes

pracucci force-pushed the improve-store-gateway-sync-logic branch from ee59803 to 883668a Compare June 10, 2021 16:06

pracucci marked this pull request as ready for review June 11, 2021 08:32

pracucci and others added 14 commits June 11, 2021 10:32

Add wait ring stability to store-gateway and improve sharding sync logic

69fd791

Signed-off-by: Marco Pracucci <[email protected]>

Fixed typo in comments

d5cb004

Signed-off-by: Marco Pracucci <[email protected]>

Added TestStoreGateway_InitialSyncWithWaitRingStability

e6c6a1f

Signed-off-by: Marco Pracucci <[email protected]>

Improved TestStoreGateway_InitialSyncWithWaitRingStability doc

084556f

Signed-off-by: Marco Pracucci <[email protected]>

Removed TestStoreGateway_BlocksSharding because superseeded by TestSt…

1f0ca99

…oreGateway_InitialSyncWithWaitRingStability Signed-off-by: Marco Pracucci <[email protected]>

TestStoreGateway_InitialSyncWithWaitRingStability seed needs to be in…

c06ce07

…itialised and logged on a per test case basis, cause tests ordering is unstable Signed-off-by: Marco Pracucci <[email protected]>

Added TestStoreGateway_BlocksSyncWithDefaultSharding_RingTopologyChan…

7841a07

…gedAfterScaleUp Signed-off-by: Marco Pracucci <[email protected]>

Fix linter issue

43ca5f2

Signed-off-by: Marco Pracucci <[email protected]>

Update pkg/storegateway/sharding_strategy.go

827c8b9

Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Jacob Lisi <[email protected]>

Fix integration tests

4ca9139

Signed-off-by: Marco Pracucci <[email protected]>

Updated doc

6083de6

Signed-off-by: Marco Pracucci <[email protected]>

Fix TestGettingStartedWithGossipedRing integration test

68cacc1

Signed-off-by: Marco Pracucci <[email protected]>

Improved filterBlocksByRingSharding() logic

349afbb

Signed-off-by: Marco Pracucci <[email protected]>

Added CHANGELOG entry

05c4265

Signed-off-by: Marco Pracucci <[email protected]>

pracucci force-pushed the improve-store-gateway-sync-logic branch from 84b59ef to 05c4265 Compare June 11, 2021 08:32

Merge branch 'master' into improve-store-gateway-sync-logic

7c8af95

pstibrany approved these changes Jun 14, 2021

View reviewed changes

stevesg approved these changes Jun 14, 2021

View reviewed changes

pracucci merged commit 868898a into cortexproject:master Jun 14, 2021

pracucci deleted the improve-store-gateway-sync-logic branch June 14, 2021 12:55

pracucci mentioned this pull request Jun 15, 2021

Crash and restart of store-gateway puts extra pressure on other store-gateways #3570

Closed

amckinley reviewed Jun 16, 2021

View reviewed changes

jtlisi mentioned this pull request Jun 17, 2021

Store-gateway blocks resharding during rollout #2823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "wait ring stability" to store-gateway and fix cold start issue #4271

Add "wait ring stability" to store-gateway and fix cold start issue #4271

pracucci commented Jun 9, 2021 •

edited

Loading

pracucci Jun 9, 2021

jtlisi left a comment

jtlisi Jun 9, 2021

pracucci Jun 10, 2021

bboreham commented Jun 10, 2021

pracucci commented Jun 10, 2021

pstibrany left a comment

stevesg left a comment

bboreham commented Jun 14, 2021

pracucci commented Jun 14, 2021

amckinley Jun 16, 2021

pracucci Jun 17, 2021

amckinley Jun 16, 2021

Add "wait ring stability" to store-gateway and fix cold start issue #4271

Add "wait ring stability" to store-gateway and fix cold start issue #4271

Conversation

pracucci commented Jun 9, 2021 • edited Loading

pracucci Jun 9, 2021

Choose a reason for hiding this comment

jtlisi left a comment

Choose a reason for hiding this comment

jtlisi Jun 9, 2021

Choose a reason for hiding this comment

pracucci Jun 10, 2021

Choose a reason for hiding this comment

bboreham commented Jun 10, 2021

pracucci commented Jun 10, 2021

pstibrany left a comment

Choose a reason for hiding this comment

stevesg left a comment

Choose a reason for hiding this comment

bboreham commented Jun 14, 2021

pracucci commented Jun 14, 2021

amckinley Jun 16, 2021

Choose a reason for hiding this comment

pracucci Jun 17, 2021

Choose a reason for hiding this comment

amckinley Jun 16, 2021

Choose a reason for hiding this comment

pracucci commented Jun 9, 2021 •

edited

Loading