ra_server_proc: Fix handling of local query replies #517
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why
The reply of local query was formatted as:
However, the code was called in two different contexts:
When the local query was executed immediately, the reply tuple was interpreted by
gen_statem
and thus the reply was sent by the local Ra server process. Exactly what we wanted.When the local query execution depended on a condition and might have to be delayed, it could be executed right away (and the reply tuple was interpreted by
gen_statem
), or the execution could be delayed and at the time of execution, the reply tuple could be interpreted as a Ra effect.Unfortunately, the same tuple has a very different meaning depending on who interprets it:
gen_statem
will send the reply regardless of the Raft state of the Ra server process.The delayed query was always processed by the Ra server that received it, regardless of its state. This means that if the Ra server is not a leader and the reply tuple is interpreted as a Ra effect, the caller will never get an answer.
This led to some timeouts in Khepri and nasty bugs in RabbitMQ. In particular, it caused the
peer_discovery_classic_config_SUITE
to fail quite frequently in RabbitMQ CI.How
First, the patch ensures the reply tuple is always interpreted as a Ra effect.
Then, it sets the
{member, ...}
replier option in the reply effect to the Ra server that executes the query. This way, the reply is always emitted emitted and by the correct Ra server, regardless of its Raft state.