Skip to content

Redis connection and connection pool maintainance (TCP keepalive, redis ping) #6855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
frittentheke opened this issue Feb 24, 2025 · 3 comments

Comments

@frittentheke
Copy link

Is your feature request related to a problem? Please describe.

When using a Redis for distributed caching (https://www.apollographql.com/docs/graphos/routing/performance/caching/distributed#redis-url-configuration), there currently is are no options in regards to keep- alive (be it Redis PING or simply TCP keepalives).

See this section of code which is compiling the various configuration options into a common client_config object:
https://github.com/apollographql/router/blob/dev/apollo-router/src/cache/redis.rs#L149-L185

Especially when considering a larger connection pool, connections might remain idle for quite a while, causing stateful devices (firewalls, connection proxies such as a service mesh) to time out the connections.

The issue is solved for SQL connection poolers for years by simply sending e.g. select(1) after the connection was idle for some time to ensure it stays open and is healthy (and stateful connection handling components are happy).

Describe the solution you'd like

The underlying fred.rs library (https://github.com/aembke/fred.rs) does have options to configure and enable tcp keepalives:

https://github.com/aembke/fred.rs/blob/46b2f8e822a0746fc1067a68a7293f6cb4fadf6c/src/types/config.rs#L309

It would be awesome to have these settings available for configuration via the router config yaml.

Describe alternatives you've considered

There also some discussion around dynamic pool maintenance at aembke/fred.rs#330 with an example implementation. There connections are scaled up and down, but there also is some idle tracking implemented by @aembke at https://github.com/aembke/fred.rs/blob/02662f10e054ae672d0e33ec4574f28afef0685d/examples/dynamic_pool.rs#L106 to scale down.

This could also be leveraged to apply e.g. a Redis PING keep-alive mechanism to the connections within the minimum scale.

Additional context

Add any other context or screenshots about the feature request here.

@aembke
Copy link

aembke commented Feb 24, 2025

For what it's worth, there's also an interface that pings the server on an interval

@frittentheke
Copy link
Author

frittentheke commented Feb 24, 2025

Thanks @aembke that's exactly what I was looking for when raising this issue! Just needs Apollo Router to allow using this :-)

@jeffutter
Copy link

👋 Have you experienced issues that could be resolved by this in practice?

I suspect we're seeing something like this. If it helps, the errors we start seeing look like this:

fred-jzS7TxLEYD: Ending reader task from <hostname> due to None
Client disconnected with error: Redis Error - kind: Canceled, details: Canceled.

Followed by a constant stream of:

Client disconnected with error: Redis Error - kind: IO, details: Os { code: 104, kind: ConnectionReset, message: \"Connection reset by peer\" }

We also observe the redis query cache hit rate drop to 0 when this happens.

I wonder if you are seeing similar behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants