Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

roystchiang · 2022-01-18T22:26:25Z

Describe the bug
With extend-writes enabled with AZ-aware replication on ingester, remote_write can fail when multiple ingesters fail in the same AZ.

Consider a cluster with 4 ingesters. ingester-A(az-1), and ingester-B(az-1), ingester-C(az-2), ingester-D(az-3). Ingester-A is in the leaving state, while ingester-B is in unhealthy state due unclean shutdown(OOM for example), ingester-C, and ingester-D are healthy

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/ring.go#L387 we'll select all ingesters, even though ingesters A and B are in the same AZ, because ingester-A is not in a healthy state.

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L36, since we pass in 4 ingesters, minSuccess is now (4/2) + 1 = 3. However, we only have 2 healthy instances, because ingesters in AZ-1 are in degraded state. This will trigger https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L54, and fail the write immediately.

There is another similar issue, but with ingester-A in leaving state, while ingester-B is in active state with unclean shutdown, and has not reached the heartbeat timeout.

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L70, distributor will require 3 ingesters for a successful write because minSuccess is 3, and instances is also 3. Distributor will attempt to write to ingester-B, ingester-C, and ingester-D, and will fail since ingester-B is actually unavaible.

To Reproduce
Steps to reproduce the behavior:

Start Cortex (SHA or version)
Perform Write Operations
Trigger an unclean shutdown for 1 ingester, and start shutting down another ingester in the same AZ

Expected behavior
I expect the extend-writes to work with just a quorum of available ingesters, since extend-writes should be a best-effort

Environment:

Infrastructure: kubernetes
Deployment tool: helm

Storage Engine

Blocks
Chunks

Additional Context

The text was updated successfully, but these errors were encountered:

roystchiang · 2022-01-19T20:32:45Z

looking at the tests https://github.com/cortexproject/cortex/blob/master/pkg/ring/replication_strategy_test.go#L60-L78

it seems that it is expected to fail when we provide more instances than the replication factor.

While I get that extend-write tries to replicate to 1 more instance in case a node is joining/leaving, is this the expected behavior?

alanprot · 2022-01-25T19:28:24Z

Yeah.. I think extended_write should indeed be a best effort otherwise we would have a complete outage in case of a zonal outage and we do a quorum read with the original replication factor.

Even without az awareness enabled it does not seems sensible to require 3 successful writes (assuming RF=3) when we already know that 1 of the 4 writes (RF+extende_rewrite) is going to fail - this means that a single ingester (other the one that is shutting down or leaving) will make the write request fail.

alvinlin123 · 2022-01-27T09:39:49Z

@bboreham @pracucci I think your knowledge on the history behind the current defaultReplicationStrategy implementation would be beneficial here. Was there a particular reason why Cortex, with replication factor of 3, would require 3 successful write instead of 2 when extended_write is in effect?

bboreham · 2022-02-18T10:18:02Z

I reported broadly the same thing as a bug: #1290.

I think the idea is "when one of your ingesters is leaving, pick a different one", but implementing it by adding 1 to the replication set always confused me.

roystchiang mentioned this issue Jan 27, 2022

update defaultReplicationStrategy to not fail with extend-write when a single instance is unhealthy #4636

Merged

3 tasks

alanprot closed this as completed in #4636 Feb 18, 2022

alanprot mentioned this issue Feb 18, 2022

Single node down when joining and leaving will cause total outage #1290

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

roystchiang commented Jan 18, 2022 •

edited

Loading

roystchiang commented Jan 19, 2022

alanprot commented Jan 25, 2022 •

edited

Loading

alvinlin123 commented Jan 27, 2022

bboreham commented Feb 18, 2022 •

edited

Loading

Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

Comments

roystchiang commented Jan 18, 2022 • edited Loading

roystchiang commented Jan 19, 2022

alanprot commented Jan 25, 2022 • edited Loading

alvinlin123 commented Jan 27, 2022

bboreham commented Feb 18, 2022 • edited Loading

roystchiang commented Jan 18, 2022 •

edited

Loading

alanprot commented Jan 25, 2022 •

edited

Loading

bboreham commented Feb 18, 2022 •

edited

Loading