Description
Observed behavior
From two subscribers in a queue group connected to two different nodes in a leaf cluster only one will receive messages IFF those messages arrive from a hub cluster to which the leaf cluster is connected. The second subscribers (initially idle) will receive messages once the primary is terminated. This is reversible (restart primary connected to the original node and will take over again).
Expected behavior
All subcribers in the same queue group a leaf cluster are treated equally.
Server and client version
Server 2.10.21
Nats - compiled from main
Host environment
Windows/Mac/Linux - reproduced by customer, Borja and me
Steps to reproduce
To reproduce deterministically, all leaf node connections and nats-cli connections will connect to specific nodes.
- Create HUB cluster with nodes HUB1, HUB2, HUB3
- Create LEAF cluster with nodes LEAF1, LEAF2, LEAF3
- Connect leaf nodes such that LEAF1-->HUB1, LEAF2-->HUB2 LEAF3-->HUB3
- Start queue group listeners
nats --server LEAF1 sub --queue=q1 foo
nats --server LEAF2 sub --queue=q1 foo
- Publish messages to HUB1
nats --server HUB1 pub foo Hello
//Only the subscriber on LEAF2 will receive messages- Publish messages to HUB3
nats --server HUB3 pub foo Hello
//Both subscribers receive messages
Tentative explanation: The "prefer local cluster listeners in work queues" logic is not leaf node aware. When the message is published such that both listeners have the same "distance" (routing hops) to the publisher the work queue LB works. When the message is published such that both listeners have the different "distance" to the publisher the work queue LB fails.