This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
Sync race with get rooms for user cache invalidation over replication #14154
Open
Description
Over the last few weeks we have started seeing syncs with missing just-joined rooms. This led me to dive deep into how sync works and ended up with identifying a few cache invalidation race conditions, my understanding of things is as follows:
- sync calls
notifier.wait_for_events
- the notifier waits for events relevant to the users rooms (it gets the room list, this maybe cached!)
- this is handled as events come in over replication
- before that handler,
process_replication_rows
is called - which the cache database processes here then for the event here
- the
_invalidate_caches_for_event
call does NOT invalidate rooms for user, that is left to the state invalidations over replication - but these are:
- sent after the event over replication
- nothing to do with the sync handling/notifier process, which is just the events
- so thus this means between event replication and state, there is a window when a sync may get notified about events whilst the get rooms for user cache remains invalid
I then confirmed my suspicious by adding a log line: beeper/synapse@1346af1 which successfully identified the occurrences of this. I will now submit two different PRs to address this specific issue:
Metadata
Metadata
Assignees
Labels
defects related to /syncProblems related to running Synapse in Worker Mode (or replication)Affects or can be seen by some users regularly or most users rarelyMajor functionality / product severely impaired, no satisfactory workaround.Bugs, crashes, hangs, security vulnerabilities, or other reported issues.