Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

cache sync results for a while after the request completes #3880

Closed
@ara4n

Description

@ara4n

Both @lampholder and @Ralith have reported similar sounding problems where initial /sync takes ~5x longer to complete from Riot than it should.

@lampholder mentioned that he tried calling initial /sync directly from curl against his HS (bypassing nginx) without a clientside timeout and it took 3 mins, but when called by Riot (which times out reqs every 90s or so to avoid network glitches wedging a request) it takes ~15 mins. Meanwhile, @Ralith (on an underpowered server with bad IO) takes 15 mins(!) to generate the sync when called directly, but >1h when called from Riot :/

I haven't seen logs from either, but my theory is that if the client times out the /sync request after the server has generated the response, but before it has sent it to the client, then the next time the client retries, the /sync will be calculated from scratch, and the failure mode may continue.

It looks like the current behaviour on develop caches sync requests via response_cache which blocks additional requests for the same sync parameters until any ongoing one completes, and then they all return with the same result:

res = yield self.response_cache.wrap(
sync_config.request_key,
self._wait_for_sync_for_user,
sync_config, since_token, timeout, full_state,
)
defer.returnValue(res)
.

However, on the matrix-org-hotfixes branch @richvdh changed it to cache sync responses for an additional 2 minutes, so that any retrying clients will get the cached copy rather than starting from scratch: 0ca2857. This is similar to the old /initialSync snapshot_cache from #457 which never seems to have made it over to /sync.

I'm not seeing any additional /sync response cache?

If my theory is correct, it looks like this hotfix needs to be ported back into develop, and sytest fixed so that it doesn't fail if initial /sync returns stale data (perhaps we could add a null filter on or something as a cachebuster).

Another explanation for this could be if Twisted is correctly cancelling the ongoing /sync when the client goes away, thus throwing away the cached response. The same fix would mitigate that, however.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-PerformancePerformance, both client-facing and admin-facingz-p2(Deprecated Label)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions