Implement Cross-Cluster Rate Limiting in Gateway #692

vishalya · 2025-05-29T00:04:22Z

Motivation

The gateway distributes queries across multiple Trino clusters, making it the ideal place to enforce query limits per user or user group. This ensures fair usage and prevents overloading any single cluster.

Enhancement Proposal

1. Central State for Gateway

The gateway currently lacks a central state due to its distributed nature.
Introduce Redis (or compatible forks like Valkey) for fast, centralized state management.
Configuration for this shared state will be provided via a config file.
The central state will be abstracted to allow future migration to other storage systems.
If the central configuration is missing, the gateway will operate as before.
It should be extendable to store cluster statistics for use by routers.

2. Rate Limiting Configuration

Configuration will follow or closely resemble the Trino resource group schema.
Store configurations in a database.
Refresh configurations at regular intervals.
Load configurations into the central state for efficient access.

3. Rate Limiting Logic

Support request bursts to maintain a good user experience.
Limit queries based on:
- User
- User group
- Query types

RoeyoOgen · 2025-05-29T12:58:59Z

Great addition to the project!

A few comments:

I'd love for an expansion of the "state" that's being saved, could you provide an example JSON for what this might look like?
Maybe add a flow diagram to where this will sit in the flow of the query. For instance, it could be before the routing logic (and thus save redundant calls to the router) Or alternatively, be after as we want to rate-limit per cluster.

andythsu · 2025-06-03T14:18:24Z

@RoeyoOgen

We are thinking to make this generic enough so that each company can store whatever they want to the db choice they desire (not necessarily Redis). For now we are thinking to provide an interface like

interface CentralCache<T> {
    save(key): void
    delete(): void
    get(): T
}

This will help abstract the underlying logic and help modularize the code.

It's a bit hard to decide where to put this logic because as you mentioned, we could put it before or after the routing logic. I was thinking to maybe just leave it for users to decide. I guess the first cut of this issue can just focus on getting CentralCache as a module into gateway, then we can decide what to do with this.

LMK your thoughts

EdenKik · 2025-06-03T18:41:47Z

Hey,
Happy to see motivation for this feature :)

A few questions:

Regarding the query flow mentioned above, do you think applying rate limiting after the routing logic is safe? Since clients could potentially send queries directly to different Trino cluster, wouldn't enforcing limits before routing be more effective?
It would be great to support additional query limit parameters, such as limiting by IP address or by query source (Superset, Python client, etc.). These would be very beneficial for our use cases.

Also was wondering, which rate limiting algorithm are you considering for this? From my previous research, the token bucket algorithm seems like it suites good for this use case

RoeyoOgen · 2025-06-03T18:50:40Z

@andythsu
Regarding:

We are thinking to make this generic enough so that each company can store whatever they want to the db choice they desire (not necessarily Redis). For now we are thinking to provide an interface like

I don't seem to understand how this would work.

Wouldn't the "state" need to be the same for everyone? How will you make this customisable?

Also, does this approach expand towards a system similar to the routing rule system where limitations can use cached items as parameters?

Perhaps after @vishalya will add some sort of flow / pseudocode it will be clearer

andythsu · 2025-06-03T19:29:00Z

I don't think the "state" needs to be the same for everyone.... eventually you could come up with your own routing rule that decides what to do by looking at the cache. Also, it is impossible to satisfy everyone's requirements in Redis. Some may want a specific format of the key, some may want a different eviction strategy... etc. So I think enforcing the same state for everyone is a bit hard

RoeyoOgen · 2025-06-04T03:51:31Z

how do you plan to make the state customisable then? where will a user configure his cache properties?

andythsu · 2025-06-04T13:32:20Z

I'm still thinking about it. Do you have any ideas on that?

RoeyoOgen · 2025-06-05T03:41:59Z

Not really :( , we can maybe hold a dedicated Google-meet to further discuss this and the other points above

andythsu · 2025-06-05T20:09:18Z

Yep. I think as starting point we can start a thread in trino-gateway channel, and then discuss the points during bi-weekly sync meetings. Here's the thread https://trinodb.slack.com/archives/C059KUNPTSP/p1749154145837889

oneonestar · 2025-06-09T01:54:37Z

The resource consumption for each query could vary by a lot. Do you plan
to use query runtime information for the rate limiting logic? For example,
a query run for 2hr is heavier than 10 queries running for 10s.
Without monitoring the actual resource consumption for the running
queries could result in a non-fair rate limit.
Is the rate limiting system planned to work with Trino resource group or
replacing it?
Are multiple rate limiting algorithms being supported? It'll be difficult
to share a common configuration for different routing algorithms. We
should make it flexible when designing the configuration.

vishalya · 2025-06-09T14:24:12Z

Thanks for all the feedback, I'll bring these points up in this week's dev sync, and we can dive deeper into our scope and ideas there.

andythsu · 2025-06-11T18:41:42Z

Here's the architectural diagram for the rate limiting module. We will follow similar structure as Trino's resource groups

vishalya assigned vishalya and andythsu Jun 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Cross-Cluster Rate Limiting in Gateway #692

Implement Cross-Cluster Rate Limiting in Gateway #692

vishalya commented May 29, 2025

RoeyoOgen commented May 29, 2025

Uh oh!

andythsu commented Jun 3, 2025

Uh oh!

EdenKik commented Jun 3, 2025 •

edited

Loading

Uh oh!

RoeyoOgen commented Jun 3, 2025 •

edited

Loading

Uh oh!

andythsu commented Jun 3, 2025

Uh oh!

RoeyoOgen commented Jun 4, 2025

Uh oh!

andythsu commented Jun 4, 2025

Uh oh!

RoeyoOgen commented Jun 5, 2025

Uh oh!

andythsu commented Jun 5, 2025

Uh oh!

oneonestar commented Jun 9, 2025

Uh oh!

vishalya commented Jun 9, 2025

Uh oh!

andythsu commented Jun 11, 2025

Uh oh!

Implement Cross-Cluster Rate Limiting in Gateway #692

Implement Cross-Cluster Rate Limiting in Gateway #692

Comments

vishalya commented May 29, 2025

Motivation

Enhancement Proposal

1. Central State for Gateway

2. Rate Limiting Configuration

3. Rate Limiting Logic

RoeyoOgen commented May 29, 2025

Uh oh!

andythsu commented Jun 3, 2025

Uh oh!

EdenKik commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoeyoOgen commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andythsu commented Jun 3, 2025

Uh oh!

RoeyoOgen commented Jun 4, 2025

Uh oh!

andythsu commented Jun 4, 2025

Uh oh!

RoeyoOgen commented Jun 5, 2025

Uh oh!

andythsu commented Jun 5, 2025

Uh oh!

oneonestar commented Jun 9, 2025

Uh oh!

vishalya commented Jun 9, 2025

Uh oh!

andythsu commented Jun 11, 2025

Uh oh!

EdenKik commented Jun 3, 2025 •

edited

Loading

RoeyoOgen commented Jun 3, 2025 •

edited

Loading