Skip to content

Potential Deadlock #3545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tunds opened this issue Apr 22, 2025 · 11 comments
Open

Potential Deadlock #3545

tunds opened this issue Apr 22, 2025 · 11 comments
Labels

Comments

@tunds
Copy link

tunds commented Apr 22, 2025

Summary

I've noticed that we have some scenarios when using the Apollo GraphQL subscriptions can deadlock when attempting to reconnect due to network disruptions. We have noticed this can happen quite often when we are using our internal VPN (GlobalProtect) if there is a disconnect and the subscriptions attempt to reconnect the application will freeze.

When inspecting what is going on the main thread this function seems to be potentially causing the app to hang.

  public func connect() {
    serialQueue.sync {
      guard !self.isConnecting else { return }
      self.didDisconnect = false
      self.isConnecting = true
      self.createHTTPRequest()
    }
  }

This all starts from the attemptReconnectionIfDesired function.

Version

1.17.0

Steps to reproduce the behavior

  1. Run app on a simulator to real device
  2. Cause a disconnect i.e. from VPN moving between networks
  3. Cause SDK to attempt a reconnect
  4. Observe that the reconnect causes a hang

Logs

Anything else?

Image
@tunds tunds added bug Generally incorrect behavior needs investigation labels Apr 22, 2025
@calvincestari
Copy link
Member

Hi @tunds - I think you're experiencing deadlocks in the websocket. We've released several versions recently to try mitigate the bug; 1.15.3, 1.16.1, and 1.18.0.

Since you're on 1.17.0, I recommend updating to 1.18.0 and monitoring it. We've seen improved stability of the websocket since that release. For now I'm going to close this issue since I believe we have the issue resolved. If however, after updating to 1.18.0, you still notice the issue please comment back here and we can re-open and investigate.

Copy link
Contributor

Do you have any feedback for the maintainers? Please tell us by taking a one-minute survey. Your responses will help us understand Apollo iOS usage and allow us to serve you better.

@tunds
Copy link
Author

tunds commented Apr 25, 2025

Hi @calvincestari,

Unfortunately we're going to have to reopen this issue. We've updated to 1.18.0 and we're still seeing the issue that I have posted above.

@calvincestari
Copy link
Member

@tunds - that's fine. You'll need to try put together something that can reproduce the issue though. This has been an incredibly difficult issue to reliably reproduce and the sample app we had no longer fails our test cases. If you have suggested changes for the websocket code we'll gladly review it.

@calvincestari calvincestari reopened this Apr 25, 2025
@tunds
Copy link
Author

tunds commented Apr 25, 2025

Hi @calvincestari,

Yh I can imagine this isn't an easy one to fix at all since it's not really obvious. One thing I will point out is it seems to occur quite often when we're using our company VPN and a proxying tool i.e. Proxyman. Sometimes this VPN can disconnect and timeout causing the websocket to auto retry connecting in the background, whilst using the proxying tool.

So I wonder if the best way to try to reproduce this is by causing the system to change IP very frequently in a short amount of time which would cause this to reconnect code to fire rapidly possibly?

It's more the steps for this scenario that need to be taken here.

@calvincestari
Copy link
Member

@tunds, we have another release coming out later today that includes a change for websocket deadlocks. Not sure if it'll fix your issue but it is in the same code and would be worth a try.

@tunds
Copy link
Author

tunds commented Apr 30, 2025

@calvincestari Awesome news!!!, Will update you. Happy for you to close this issue and then I can leave a comment for you to reopen if we find the issue again. Unless you want to keep it open to monitor this?

@calvincestari
Copy link
Member

We can leave this one open for now.

@tunds
Copy link
Author

tunds commented May 2, 2025

I have seen the same issue on 1.21.0 again unfortunately:(

@AnthonyMDev
Copy link
Contributor

We're going to be re-writing the entire web socket layer for 2.0 in the coming months. We've spent a lot of time chasing down race conditions in the existing web socket layer in the past few months, but in order to completely remove these race conditions and dead locks, a rewrite is really necessary.

If you'd like to take a crack at a PR to address this, we're happy to take a look, but we likely won't be investing much more time into tracking these down and instead will focus our efforts on the rewrite.

@tunds
Copy link
Author

tunds commented May 9, 2025

No worries, appreciate the effort by the team to look into this. Looking forward to 2.0.0 and the great things you bring 🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants