Description
Synapse takes forever to send large responses. It takes us longer to send just the response than it does for us to process and encode_json_response
in some cases.
Examples
98s to process the request and encode_json_response
but the request isn't finished sending until 484s (8 minutes) which is 6.5 minutes of dead-time. The response size is 36MB
Jaeger trace: 4238bdbadd9f3077.json

59s to process and finishes after 199s. The response size is 36MB
Jaeger trace: 2149cc5e59306446.json

I've come across this before and it's not a new thing. For example in #13620 (original issue), I described as the "mystery gap at the end after we encode the JSON response (encode_json_response
)" but never encountered it being this egregious.
It can also happen for small requests. 2s to process and finishes after 5s. The response size is 120KB

Investigation
@kegsay pointed out _write_bytes_to_request
which runs after encode_json_response
and has comments like "Write until there's backpressure telling us to stop." that definitely hint at some areas of interest.
synapse/synapse/http/server.py
Lines 869 to 873 in 03937a1
The JSON serialization is done in a background thread because it can block the reactor for many seconds. This part seems normal and fast (no problem).
But we also use _ByteProducer
to send the bytes down to the client. Using a producer ensures we can send down all of the bytes to the client without hitting a 60s timeout (see context in comments below)
synapse/synapse/http/server.py
Lines 883 to 889 in d40bc27
This logic was added in:
- Iteratively encode JSON responses matrix-org/synapse#8013
- Switch the JSON byte producer from a pull to a push producer matrix-org/synapse#8116
Some extra time is expected as we're working with the reactor instead of blocking it but it seems like something isn't tuned optimally (chunk size, starting/stopping too much, etc)