Skip to content

Elements Connections can "Stall" on large bidirectional message exchange (eg in IBD). #1233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lolhill opened this issue Apr 17, 2023 · 1 comment
Assignees

Comments

@lolhill
Copy link

lolhill commented Apr 17, 2023

Summary of the conditions for the problem to occur and the outcome:

  1. A "large" message must be being sent by both peers to each other contemporaneously (this condition can occur in some circumstances in IBD).
  2. Where this occurs, both peers will "pause" receiving any socket buffer traffic until the full message is sent.
  3. That message must be large enough to crash the TCP recive window on both sides (See Further Comments below).
  4. Receiving would only be resumed when the message is sent - This condition never occurs because both sides crash the recv window before the full message can be sent (either way).
  5. The connection at this point is inactive and effectively "hung" until the inactivity timeout kicks in and the connection is tinned.
  6. It's entirely possible that after the connections is tinned, further/new connections have exactly the same issue (Further Comments Below).
  7. A quick review of the Bitcoin Core 22.1.1 code base suggests that, conceptually, the problem exists there. If the relevant conditions hit, the bug should hit there, believe.

Further Comments:

  1. "large" in this case isn't necessarily huge - depending on the network latency, TCP Windows Scaling and Windows values in play, a few hundred kilobytes should suffice.
  2. Although the message is approximately 2Mb in Elements testing, only a few hundred kb are actually sent before the Windows are closed (see the attached tcpdump). TCP Window sizes are sub-100kb.
  3. The application/message command line maxsendbuffer/maxrecvbuffer settings (and the related fPauseSend and fPauseRecv flags) are irrelevant to this issue - They can be set large, on both sides, and the problem will still occur. Setting them is not a workaround.

A bit of extra detail/markers:

  1. A netstat -anp will reveal large, "stuck" Recv and Send Queues for the connection.

  2. A tcpdump will show TCP Zero Windows on both sides of the connection - ie confirming that nothing is being pulled off the receive buffers (See attached example).

  3. Testing simply requires hitting the conditions described above. Factors that may make the issue more likely are:

    • Increasing the network latency of the connection (eg using netem).
    • Reducing the TCP Window Scale Factor (or disabling TCP Window Scaling).

Expected behavior
The viable TCP Connection is maintained (data is read from the socket receive buffer on at least one side of the connection to relieve the stall)

Actual behavior
The connection is "stalled" until the inactivity timeout is reached, when the connection is closed (but on re-establishment with the same client, it's entirely likely the issue may recur - See to reproduce below).

To reproduce
See comments above. The issue can be difficult to reproduce. One approach using release builds is to:

  1. Have a "client" approximately 1200 blocks behind the liquidv1 chain.
  2. Ensure high network latency between the client and a single configured peer to sync to (eg use netem to simulate ~150ms latency)
  3. Launch the client to proceed with IBD.

Note that this is one concrete example of the issue. The problem appears general.

System information
Tested/Reproduced on Elements versions 21.0.2 - 22.1.1
OSes = Centos 8, Unbuntu 22.0.4

Extra Information
Example debug log snippet (Elements 21.0.2 Release Build):

2022-09-16T14:53:53Z Added connection peer=0
2022-09-16T14:53:53Z sending version (110 bytes) peer=0
2022-09-16T14:53:53Z send version message: version 70016, blocks=2009311, us=[::]:0, peer=0
2022-09-16T14:53:54Z received: version (110 bytes) peer=0
2022-09-16T14:53:54Z sending wtxidrelay (0 bytes) peer=0
2022-09-16T14:53:54Z sending sendaddrv2 (0 bytes) peer=0
2022-09-16T14:53:54Z sending verack (0 bytes) peer=0
2022-09-16T14:53:54Z Leaving InitialBlockDownload (latching to false)
2022-09-16T14:53:54Z ProcessMessages: advertising address 188.29.65.218:7042
2022-09-16T14:53:54Z sending getaddr (0 bytes) peer=0
2022-09-16T14:53:54Z receive version message: /Elements Core:0.21.0.2/: version 70016, blocks=2010501, us=188.29.65.218:35788, peer=0
2022-09-16T14:53:54Z added time data, samples 2, offset +0 (+0 minutes)
2022-09-16T14:53:54Z received: wtxidrelay (0 bytes) peer=0
2022-09-16T14:53:54Z received: sendaddrv2 (0 bytes) peer=0
2022-09-16T14:53:54Z received: verack (0 bytes) peer=0
2022-09-16T14:53:54Z New outbound peer connected: version: 70016, blocks=2010501, peer=0 (full-relay)
2022-09-16T14:53:54Z sending sendheaders (0 bytes) peer=0
2022-09-16T14:53:54Z sending sendcmpct (9 bytes) peer=0
2022-09-16T14:53:54Z sending sendcmpct (9 bytes) peer=0
2022-09-16T14:53:54Z sending ping (8 bytes) peer=0
2022-09-16T14:53:54Z AdvertiseLocal: advertising address 188.29.65.218:7042
2022-09-16T14:53:54Z sending addrv2 (16 bytes) peer=0
2022-09-16T14:53:54Z initial getheaders (2009310) to peer=0 (startheight:2010501)
2022-09-16T14:53:54Z sending getheaders (1061 bytes) peer=0
2022-09-16T14:53:54Z sending feefilter (8 bytes) peer=0
2022-09-16T14:53:54Z received: sendheaders (0 bytes) peer=0
2022-09-16T14:53:54Z received: sendcmpct (9 bytes) peer=0
2022-09-16T14:53:54Z received: sendcmpct (9 bytes) peer=0
2022-09-16T14:53:54Z received: ping (8 bytes) peer=0
2022-09-16T14:53:54Z sending pong (8 bytes) peer=0
2022-09-16T14:53:54Z received: addrv2 (16 bytes) peer=0
2022-09-16T14:53:54Z received: getheaders (1061 bytes) peer=0
2022-09-16T14:53:54Z getheaders 2008444 to end from peer=0
2022-09-16T14:53:54Z sending headers (1272026 bytes) peer=0
2022-09-16T14:53:54Z tor: Error connecting to Tor control socket
2022-09-16T14:53:54Z tor: Not connected to Tor control port 127.0.0.1:9051, trying to reconnect

The connection is "stalled" at this point,.

Example pcap file (from a different execution / on Elements 22.1.1) - Shows TCP Windows crashing each side.

ElementsTCPWindowStall.zip

The key code is GenerateSelectSet and SocketSendData.

@psgreco psgreco self-assigned this Apr 18, 2023
fanquake added a commit to bitcoin-core/gui that referenced this issue Aug 17, 2023
3388e52 Rework receive buffer pushback (Pieter Wuille)

Pull request description:

  See ElementsProject/elements#1233. There, it has been observed that if both sides of a P2P connection have a significant amount of data to send, a stall can occur, where both try to drain their own send queue before trying to receive. The same issue seems to apply to the current Bitcoin Core codebase, though I don't know whether it's a frequent issue for us.

  The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.

  This is a significant reduction in how aggressive the "receive pushback" mechanism is, because now it will only mildly push back while sending progress is made; if the other side stops receiving entirely, the pushback disappears. I don't think that's a serious problem though:
  * We still have a pushback mechanism at the application buffer level (when the application receive buffer overflows, receiving is paused until messages in the buffer get processed; waiting on our own net_processing thread, not on the remote party).
  * There are cases where the existing mechanism is too aggressive; e.g. when the send queue is non-empty, but tiny, and can be sent with a single send() call. In that case, I think we'd prefer to still receive within the same processing loop of the network thread.

ACKs for top commit:
  ajtowns:
    ACK 3388e52
  naumenkogs:
    ACK 3388e52
  mzumsande:
    Tested ACK 3388e52

Tree-SHA512: 28960feb3cd2ff3dfb39622510da62472612f88165ea98fc9fb844bfcb8fa3ed3633f83e7bd72bdbbbd37993ef10181b2e1b34836ebb8f0d83fd1c558921ec17
@delta1
Copy link
Member

delta1 commented Sep 17, 2024

Thanks @lolhill for finding this. Will be fixed from upstream bitcoin/bitcoin#27981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants