Skip to content

improve load balancing at higher deployment query volume #933

Open
@Theodus

Description

@Theodus

For high-volume deployments, the gateway has a tendency to pick multiple indexers (often 3) to maximize performance. The primary issue with this behavior is that we run a higher risk of unnecessarily overloading indexers. Above some threshold indexer-selection should load-balance requests between indexers that would otherwise all be included in the selected set.

The first challenge is detecting high volume on a deployment. Here's an rough design:

The gateway should track query volume per subgraph deployment. This would likely be a parking_lot::RwLock<HashMap<DeploymentId, AtomicUsize>>. Inserting into the map should be relatively infrequent, and updating an entry only requires a read lock. The atomic counter is incremented by the amount of indexers selected, and decremented once each indexer request completes. A "high volume" state on a deployment is when this counter is above some threshold, meaning there are approximately n outstanding indexer requests happening concurrently.

There are multiple potential approaches for what to do when we detect high volume on a deployment. Here's a list that increase in difficulty, and might be a reasonable order of iterations to go down until we hit "good enough for now":

  1. When the deployment is "high volume", call indexer_selection::select with a limit of 1 instead of 3.
  2. Add a parameter to indexer_selection::select that acts as a cost to including additional indexers in the selected set. This value should increase at higher volume.
  3. Use a proper load-balancing algorithm between the selected indexers, see this for inspiration: https://samwho.dev/load-balancing/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions