Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ACS client not shutting down cleanly if there is a pending response from the server during shutdown. #61

Merged
merged 1 commit into from
Mar 31, 2025

Conversation

copybara-service[bot]
Copy link

Fix ACS client not shutting down cleanly if there is a pending response from the server during shutdown.

In AcsAgentClient, we have a restart_client_thread_ that tries to re-create the reactor when the RPC is being terminated. To know when the RPC is being terminated, we rely on the AcsAgentClientReactor::OnReadDone(bool ok), in which the boolean ok would indicate whether RPC is being closed. However, this does not differentiate whether the RPC is proactively cancelled by client or server. Ideally, we do not want to restart the client when the client itself wants to cancel the RPC.

Instead of passing a simple boolean ok to AcsAgentClient, we define a new enum RpcStatus, to tell AcsAgentClient whether the RPC is ok, canceled by client, or canceled by server. If it is canceled by client, then we should not trigger the RestartClient() function.

This change will allow us to destruct the reactor in AcsAgentClient::Shutdown(), instead of calling just reactor_->Cancel(). Currently, if we directly destruct reactor_ in Shutdown(), we will have a deadlock. when the reactor_ is being reset, the reactor_mtx_ is being held by Shutdown(), and in the meantime, ReactorReadCallback() will try to acquire the same mutex. With this change, in ReactorReadCallback(), when we know the RPC is canceled by client, we will not try to acquire the reactor_mtx_ lock and instead directly return. Then, we are able to cleanly destruct the reactor object in Shutdown() function.

The consequence of not destructing reactor_ in Shutdown() is that when the client is getting destructed after Shutdown() is called, read_callback_ will be destructed earlier than reactor_. If the RPC is not fully terminated in reactor_, reactor_ may try to invoke read_callback_, in this race condition, we could have access destructed field AcsAgentClient.

Copy link

google-cla bot commented Mar 27, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

…se from the server during shutdown.

In AcsAgentClient, we have a restart_client_thread_ that tries to re-create the reactor when the RPC is being terminated. To know when the RPC is being terminated, we rely on the `AcsAgentClientReactor::OnReadDone(bool ok)`, in which the boolean `ok` would indicate whether RPC is being closed. However, this does not differentiate whether the RPC is proactively cancelled by client or server. Ideally, we do not want to restart the client when the client itself wants to cancel the RPC.

Instead of passing a simple boolean ok to AcsAgentClient, we define a new enum `RpcStatus`, to tell AcsAgentClient whether the RPC is ok, canceled by client, or canceled by server. If it is canceled by client, then we should not trigger the RestartClient() function.

This change will allow us to destruct the reactor in AcsAgentClient::Shutdown(), instead of calling just reactor_->Cancel(). Currently, if we directly destruct reactor_ in Shutdown(), we will have a deadlock. when the reactor_ is being reset, the reactor_mtx_ is being held by Shutdown(), and in the meantime, ReactorReadCallback() will try to acquire the same mutex. With this change, in ReactorReadCallback(), when we know the RPC is canceled by client, we will not try to acquire the reactor_mtx_ lock and instead directly return. Then, we are able to cleanly destruct the reactor object in Shutdown() function.

The consequence of not destructing reactor_ in Shutdown() is that when the client is getting destructed after Shutdown() is called, read_callback_ will be destructed earlier than reactor_. If the RPC is not fully terminated in reactor_, reactor_ may try to invoke read_callback_, in this race condition, we could have access destructed field AcsAgentClient.

PiperOrigin-RevId: 742285551
@copybara-service copybara-service bot merged commit c88e0e7 into main Mar 31, 2025
@copybara-service copybara-service bot deleted the test_741021477 branch March 31, 2025 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants