-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingester extend-writes with AZ-awareness expects replicas in all AZs rather than just quorum #4626
Comments
looking at the tests https://github.com/cortexproject/cortex/blob/master/pkg/ring/replication_strategy_test.go#L60-L78 it seems that it is expected to fail when we provide more instances than the replication factor. While I get that extend-write tries to replicate to 1 more instance in case a node is joining/leaving, is this the expected behavior? |
Yeah.. I think Even without az awareness enabled it does not seems sensible to require 3 successful writes (assuming RF=3) when we already know that 1 of the 4 writes (RF+extende_rewrite) is going to fail - this means that a single ingester (other the one that is shutting down or leaving) will make the write request fail. |
I reported broadly the same thing as a bug: #1290. I think the idea is "when one of your ingesters is leaving, pick a different one", but implementing it by adding 1 to the replication set always confused me. |
Describe the bug
With extend-writes enabled with AZ-aware replication on ingester, remote_write can fail when multiple ingesters fail in the same AZ.
Consider a cluster with 4 ingesters. ingester-A(az-1), and ingester-B(az-1), ingester-C(az-2), ingester-D(az-3). Ingester-A is in the
leaving
state, while ingester-B is inunhealthy
state due unclean shutdown(OOM for example), ingester-C, and ingester-D are healthyIn https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/ring.go#L387 we'll select all ingesters, even though ingesters A and B are in the same AZ, because ingester-A is not in a healthy state.
In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L36, since we pass in 4 ingesters, minSuccess is now (4/2) + 1 = 3. However, we only have 2 healthy instances, because ingesters in AZ-1 are in degraded state. This will trigger https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L54, and fail the write immediately.
There is another similar issue, but with ingester-A in
leaving
state, while ingester-B is inactive
state with unclean shutdown, and has not reached the heartbeat timeout.In https://github.com/cortexproject/cortex/blame/84f240e058eaa0e50889252f60ce72643b5a62c8/pkg/ring/replication_strategy.go#L70, distributor will require 3 ingesters for a successful write because minSuccess is 3, and instances is also 3. Distributor will attempt to write to ingester-B, ingester-C, and ingester-D, and will fail since ingester-B is actually unavaible.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect the extend-writes to work with just a quorum of available ingesters, since extend-writes should be a best-effort
Environment:
Storage Engine
Additional Context
The text was updated successfully, but these errors were encountered: