-
Notifications
You must be signed in to change notification settings - Fork 49
Fix bug where client "forgets" it received CONNECT_ACK #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The client was setting `state=CONNECT_PROCESSED` in two places: 1) Right writing the CONNECT message 2) When the CONNECT message's write-completion callback fired. Fix is to ONLY set `state=CONNECT_PROCESSED` in place 1. Bug would occur if the CONNECT_ACK was received before the CONNECT's write-complete callback fired. The state would go from CONNECT_ACK_PROCESSED back to CONNECT_PROCESSED.
I'm seeing the failure observed here in CI intermittently as well (Ubuntu20, clang10) and this seems to be the core of it though I don't know how to eyeball the preconditions here:
|
I've narrowed the mutex lock issue to this call: https://github.com/awslabs/aws-c-event-stream/blob/main/tests/event_stream_rpc_client_connection_test.c#L1174 via changing the callback handler to ensure they were unique. At the same time of this crash, the test function has returned already and is waiting for the server connection shutdown to be completed. (gdb info threads; gdb thread 1; gdb bt) |
The bump fixes two errors in aws-c-event-stream: * awslabs/aws-c-event-stream#55 Release lock before invoking callbacks. (Deadlocks seen) * awslabs/aws-c-event-stream#60 Fix bug where client "forgets" it received CONNECT_ACK Package-Manager: Portage-3.0.17, Repoman-3.0.2 Signed-off-by: Sven Eden <[email protected]>
The bump fixes two errors in aws-c-event-stream: * awslabs/aws-c-event-stream#55 Release lock before invoking callbacks. (Deadlocks seen) * awslabs/aws-c-event-stream#60 Fix bug where client "forgets" it received CONNECT_ACK Package-Manager: Portage-3.0.17, Repoman-3.0.2 Signed-off-by: Sven Eden <[email protected]>
The bump fixes two errors in aws-c-event-stream: * awslabs/aws-c-event-stream#55 Release lock before invoking callbacks. (Deadlocks seen) * awslabs/aws-c-event-stream#60 Fix bug where client "forgets" it received CONNECT_ACK Package-Manager: Portage-3.0.17, Repoman-3.0.2 Signed-off-by: Sven Eden <[email protected]> Signed-off-by: Joonas Niilola <[email protected]>
Client bug
Bug would manifest as a
AWS_ERROR_EVENT_STREAM_RPC_PROTOCOL_ERROR
if anAPPLICATION_MESSAGE
was sent immediately upon receiving aCONNECT_ACK
Turns out the client was setting
state=CONNECT_PROCESSED
in two places:Fix is to ONLY set
state=CONNECT_PROCESSED
in place 1.Bug would occur if the CONNECT_ACK was received before the CONNECT's write-complete callback fired. The state would go from CONNECT_ACK_PROCESSED back to CONNECT_PROCESSED.
This bug began manifesting when aws-c-io changed (PR# awslabs/aws-c-io#353) so that the write-completion callback never fired synchronously. This made it possible to receive a CONNECT_ACK before the CONNECT's write-completion callback fired. The write-completion callback always worked this way on windows though, so it is still righteous to fix the bug in aws-c-event-stream, rather than make further changes to aws-c-io.
Server bug
Server was only setting
state=CONNECT_ACK_PROCESSED
in one place, but it was setting it from the write-completion callback. Seems like a similar thing could happen:CONNECT_ACK_PROCESSED
STATE=CONNECT_ACK_PROCESSED
, but too late connection is already shutting down in errorFix is to set
state=CONNECT_ACK_PROCESSED
when writing the CONNECT_ACK message, not when the write completes.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.