Allow server to cancel when deadline expires #999
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This makes some assertions for timeout cases a little more lenient. The tests previously asserted that a server would return a "deadline exceeded" error when the client's deadline expires. However, not all gRPC servers behave this way. The gRPC Go reference implementation was recently updated to sometimes just cancel the stream instead of sending back an error response. This doesn't seem unreasonable, so instead of updating the "known failing" list of cases for the grpc-go server, this PR makes this behavior allowed (which means it will be allowed in any other implementation-under-test, too).
The test case structure lets us define other acceptable error codes for test cases, however we don't actually want to add "canceled" as generally acceptable for these tests. It's really only acceptable for the server (clients should always report "deadline exceeded" for these cases). So we need to update the test assertions logic to get that level of nuance.
If you look at the latest commit on main as well as all open Depend-a-bot PRs, you'll see that CI is red on these commits because of this behavior -- it seems like one (out of five) of the timeout tests fails with each run. Interestingly, they normally pass (I've yet to even see a run where more than one failed; they seem to always pass when I run the suite locally). The actual failures observed so far have been in "unary", "client-stream", and "half-duplex/bidi-stream" tests (not yet observed for "server-stream" or "full-fuplex/bidi-stream" tests).
The behavior causing CI to be flaky/red was introduced in #993, which pulled in the latest grpc-go, v1.72.0, which included grpc/grpc-go#8071 (which is the culprit for the new behavior).