Skip to content

Queue Groups on leaf clusters not balancing correctly when messages are routed in from hub cluster [v2.10.21] #5972

@roeschter

Description

@roeschter

Observed behavior

From two subscribers in a queue group connected to two different nodes in a leaf cluster only one will receive messages IFF those messages arrive from a hub cluster to which the leaf cluster is connected. The second subscribers (initially idle) will receive messages once the primary is terminated. This is reversible (restart primary connected to the original node and will take over again).

Expected behavior

All subcribers in the same queue group a leaf cluster are treated equally.

Server and client version

Server 2.10.21
Nats - compiled from main

Host environment

Windows/Mac/Linux - reproduced by customer, Borja and me

Steps to reproduce

To reproduce deterministically, all leaf node connections and nats-cli connections will connect to specific nodes.

  1. Create HUB cluster with nodes HUB1, HUB2, HUB3
  2. Create LEAF cluster with nodes LEAF1, LEAF2, LEAF3
  3. Connect leaf nodes such that LEAF1-->HUB1, LEAF2-->HUB2 LEAF3-->HUB3
  4. Start queue group listeners
  5. nats --server LEAF1 sub --queue=q1 foo
  6. nats --server LEAF2 sub --queue=q1 foo
  7. Publish messages to HUB1
  8. nats --server HUB1 pub foo Hello //Only the subscriber on LEAF2 will receive messages
  9. Publish messages to HUB3
  10. nats --server HUB3 pub foo Hello //Both subscribers receive messages

Tentative explanation: The "prefer local cluster listeners in work queues" logic is not leaf node aware. When the message is published such that both listeners have the same "distance" (routing hops) to the publisher the work queue LB works. When the message is published such that both listeners have the different "distance" to the publisher the work queue LB fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regressiontriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions