Sync current token may be ahead of event stream cache position using workers #14158
Description
Based on my initial work investigating sync cache races in: #14154
Sync makes use of the event stream cache to determine whether a room has changed between the since & current tokens. This is then used to limit the number of rooms events are queried for in get_room_events_stream_for_rooms
. After discovering the cache invalidation races above I added a quick log line for this: beeper/synapse@62497db (after beeper/synapse@5297155).
And it logs! Only a very small handful of occurrences over the last ~5 days and the position difference has so far been 1 every time. I suspect this may also occur against other stream caches but have not confirmed.
The worry here is if an event was sent within the gap it may be missed from an incremental sync which is especially bad because the user will never see or know about this event unless they re-init sync (or the client backfills it?).
One solution to this is to implement a waiting mechanism on StreamCache
so a worker can wait for the cache to catch up with the current token for a given sync before fetching data. Because this is super rare and even when it happens it's a tiny position difference this would probably have negligable impact in sync performance and provide a shield against cache invalidation races over replication.