This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
Federation catch-up is not necessarily correct due to the sharding of the event stream #15260
Open
Description
In the destinations
table, we have last_successful_stream_ordering
which contains the stream_ordering
of the most recent PDU that was successfully sent to that destination.
However, given that we have sharded the event stream (since federation catch-up was implemented, I believe), I think there is now a small race condition when the PDUs are not produced in stream_ordering
order but yet all get enqueued for transmission to the same destination.
The race condition means that we might track the fact that we successfully transmitted PDUs up to stream ordering x
when in fact the PDU at x - 1
(etc) was not transmitted — and won't be transmitted during catch-up later.