Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626

Closed
1 of 2 tasks
roystchiang opened this issue Jan 18, 2022 · 4 comments · Fixed by #4636
Closed
1 of 2 tasks

Comments

@roystchiang
Copy link
Contributor

roystchiang commented Jan 18, 2022

Describe the bug
With extend-writes enabled with AZ-aware replication on ingester, remote_write can fail when multiple ingesters fail in the same AZ.

Consider a cluster with 4 ingesters. ingester-A(az-1), and ingester-B(az-1), ingester-C(az-2), ingester-D(az-3). Ingester-A is in the leaving state, while ingester-B is in unhealthy state due unclean shutdown(OOM for example), ingester-C, and ingester-D are healthy

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/ring.go#L387 we'll select all ingesters, even though ingesters A and B are in the same AZ, because ingester-A is not in a healthy state.

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L36, since we pass in 4 ingesters, minSuccess is now (4/2) + 1 = 3. However, we only have 2 healthy instances, because ingesters in AZ-1 are in degraded state. This will trigger https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L54, and fail the write immediately.

There is another similar issue, but with ingester-A in leaving state, while ingester-B is in active state with unclean shutdown, and has not reached the heartbeat timeout.

In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L70, distributor will require 3 ingesters for a successful write because minSuccess is 3, and instances is also 3. Distributor will attempt to write to ingester-B, ingester-C, and ingester-D, and will fail since ingester-B is actually unavaible.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex (SHA or version)
  2. Perform Write Operations
  3. Trigger an unclean shutdown for 1 ingester, and start shutting down another ingester in the same AZ

Expected behavior
I expect the extend-writes to work with just a quorum of available ingesters, since extend-writes should be a best-effort

Environment:

  • Infrastructure: kubernetes
  • Deployment tool: helm

Storage Engine

  • Blocks
  • Chunks

Additional Context

@roystchiang
Copy link
Contributor Author

looking at the tests https://github.com/cortexproject/cortex/blob/master/pkg/ring/replication_strategy_test.go#L60-L78

it seems that it is expected to fail when we provide more instances than the replication factor.

While I get that extend-write tries to replicate to 1 more instance in case a node is joining/leaving, is this the expected behavior?

@alanprot
Copy link
Member

alanprot commented Jan 25, 2022

Yeah.. I think extended_write should indeed be a best effort otherwise we would have a complete outage in case of a zonal outage and we do a quorum read with the original replication factor.

Even without az awareness enabled it does not seems sensible to require 3 successful writes (assuming RF=3) when we already know that 1 of the 4 writes (RF+extende_rewrite) is going to fail - this means that a single ingester (other the one that is shutting down or leaving) will make the write request fail.

@alvinlin123
Copy link
Contributor

@bboreham @pracucci I think your knowledge on the history behind the current defaultReplicationStrategy implementation would be beneficial here. Was there a particular reason why Cortex, with replication factor of 3, would require 3 successful write instead of 2 when extended_write is in effect?

@bboreham
Copy link
Contributor

bboreham commented Feb 18, 2022

I reported broadly the same thing as a bug: #1290.

I think the idea is "when one of your ingesters is leaving, pick a different one", but implementing it by adding 1 to the replication set always confused me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants