Faster joins: handle total failure to sync state

Currently, if we try every server in the room and are unable to sync state from any of them, we give up, leaving us with a room stuck in "partial state" state, and any C-S requests for state in that room timing out indefinitely.

It's not entirely clear what we should do in this case:
 * Giving up isn't the right thing to do if there's a temporary network outage
 * Retrying indefinitely is also not the right thing to do if we can reach all homeservers and they all claim they don't have the state we want.

https://github.com/matrix-org/synapse/blob/7c6b2204d143550d81e5bf9612c4e69fe0866b4c/synapse/handlers/federation.py#L1594-L1610

	if attempt == len(destinations) - 1:
	# We have tried every remote server for this event. Give up.
	# TODO(faster_joins) giving up isn't the right thing to do
	# if there's a temporary network outage. retrying
	# indefinitely is also not the right thing to do if we can
	# reach all homeservers and they all claim they don't have
	# the state we want.
	# https://github.com/matrix-org/synapse/issues/13000
	logger.error(
	"Failed to get state for %s at %s from %s because %s, "
	"giving up!",
	room_id,
	event,
	destination,
	e,
	)
	raise

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Faster joins: handle total failure to sync state #13000

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Faster joins: handle total failure to sync state #13000

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions