Skip to content

Allow for (subgraph) response size limits and traffic shaping to save the router from OOM due to large responses #6999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
frittentheke opened this issue Mar 13, 2025 · 0 comments

Comments

@frittentheke
Copy link

frittentheke commented Mar 13, 2025

Is your feature request related to a problem? Please describe.

We recently had a case of huge responses from one particular subgraph causing the Router to OOM.
To be exact the subgraph responded with megabytes of "errors": [].

While there are lots of settings to apply traffic shaping or limit requests (from clients),
I found no way to configure any limits on subgraph response sizes that serve as a circuit-breaker for such cases:

In essence this feature request is just another aspect (like max_depth and max_height) by which the resources of individual requests can be limited.

Describe the solution you'd like

I'd like to be able to set a limit on the size of an individual subgraph request that the router will parse and then compile into the response in order to limit the maximum required memory per individual original request.

This might not necessarily have to be a limit per individual subgraph request, but some configurable maximum per request the router processes in order to now allow for a few requests to fill all of the memory.

Certainly there has to be a log message indicating that requests where dropped / rejected due to their higher than allowed memory, like for all other request limits.

It might also make sense to indicate to the client that their response is larger than allowed by the router, maybe using https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/413 ?

Describe alternatives you've considered

An alternative would be some sort of overload protection in case the router memory approaches certain threshold. Something similar to the overload manager built into Envoy: https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/overload_manager/overload_manager

While this might be beneficial in any case as lots of small inflight requests might also cause a router to go OOM, this tackles a different problem as we had with a single malfunctioning subgraph being the troublemaker not the amount of concurrent requests per se.

Being unable to limit the memory footprint of handling a single (of potentially many concurrent) request, makes it hard to determine the required memory for "full throttle" (all connections / threads / workers / ...) being busy the router requires.

Additional context

There are some related issues and feature requests I found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant