Revert "Let forward requests run until timeout (#2679) + style fixes. #2771

bwplotka · 2020-06-16T06:11:19Z

Let's revert the commit in the path of the problematic handler we saw on production.

I think the PR #2679 made sense, we want to replicate 3x, just 2x strictly and 3rd one best effort.
While it makes sense logically, there are some reasons why the receive could be saturated because of this change e.g

If there is single slow writer, we would have to start new connections and eventually run out of connections.
We still use extra bandwidth, because of requests left running behind.
We put more pressure on TSDBs (have more concurrent appends)

Essentially with this change indeed we have "leaking" rate-limiting (if there is any;p), because we claim request end, despite things are still processing.

Anyway, let's build an image from this branch & deploy it and let's see if we can repro. There are solid reasons why this might attribute for potential saturation

This reverts commit 92fa67d.

Signed-off-by: Bartlomiej Plotka <[email protected]>

bwplotka · 2020-06-16T06:15:29Z

Let's NOT merge until we test this out.

bwplotka · 2020-06-16T06:20:54Z

We should have revert-receive-cancel-2020-06-16-88f0440d image on quay now

squat

I don't think this is what we want, with this change we'll again see high error rates even when 2 writes succeed and the other is still in flight. In other words this is okayish for testing the theory (though we might confuse the source of errors), but this is not code we should merge

bwplotka · 2020-06-16T08:30:14Z

I don't think error rates are the problem here - we can easily change the instrumentation to properly report those in separate PR. It's better than not being able to handle the load (:

squat · 2020-06-16T08:32:06Z

Yes of course we can change the instrumentation, and if we choose to go this route then I think we need to do that before merging the PR since it will otherwise conflate real errors with expected cancellations

bwplotka · 2020-06-16T08:35:44Z

How mergin this in this state is worse than not being able to use master at all? (:

bwplotka · 2020-06-16T08:36:13Z

but yea agree, something to fix ASAP, let's deploy this first and check

bwplotka · 2020-06-16T11:53:19Z

Looks like after all it's isolation fix not this, closing.

bwplotka added 2 commits June 15, 2020 22:21

Revert "Let forward requests run until timeout (#2679)"

4d35413

This reverts commit 92fa67d.

Minor style fixes.

4f822de

Signed-off-by: Bartlomiej Plotka <[email protected]>

bwplotka marked this pull request as draft June 16, 2020 06:23

bwplotka requested review from kakkoyun, brancz and squat June 16, 2020 06:23

squat reviewed Jun 16, 2020

View reviewed changes

bwplotka closed this Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Let forward requests run until timeout (#2679) + style fixes. #2771

Revert "Let forward requests run until timeout (#2679) + style fixes. #2771

bwplotka commented Jun 16, 2020 •

edited

Loading

bwplotka commented Jun 16, 2020

bwplotka commented Jun 16, 2020

squat left a comment

bwplotka commented Jun 16, 2020

squat commented Jun 16, 2020

bwplotka commented Jun 16, 2020 •

edited

Loading

bwplotka commented Jun 16, 2020

bwplotka commented Jun 16, 2020

Revert "Let forward requests run until timeout (#2679) + style fixes. #2771

Revert "Let forward requests run until timeout (#2679) + style fixes. #2771

Conversation

bwplotka commented Jun 16, 2020 • edited Loading

bwplotka commented Jun 16, 2020

bwplotka commented Jun 16, 2020

squat left a comment

Choose a reason for hiding this comment

bwplotka commented Jun 16, 2020

squat commented Jun 16, 2020

bwplotka commented Jun 16, 2020 • edited Loading

bwplotka commented Jun 16, 2020

bwplotka commented Jun 16, 2020

bwplotka commented Jun 16, 2020 •

edited

Loading

bwplotka commented Jun 16, 2020 •

edited

Loading