Description
Expected Behavior
Feast server deployed via Feast Operator should consistently pass readiness probe checks and respond promptly on the /health endpoint, especially under light traffic.
Current Behavior
The readiness probe frequently fails. Upon manual testing (kubectl exec into the pod and curl /health), the response is usually fast but occasionally takes up to 10 seconds, leading to intermittent probe failures.
Feast is deployed on GKE via feast-operator (v0.49). The online store is configured to use Redis. Traffic is very low (estimated < 60 requests/minute), and CPU/memory usage appears stable for both Feast and Redis pods.
We were using /push to a BigQuery PushSource, but stopped doing so few weeks ago. Since then, readiness probe failures have completely stopped.
This suggests that the /push operation may be blocking the main thread and causing /health to respond slowly.
Steps to reproduce
- Deploy Feast v0.49 via feast-operator on GKE.
- Configure online store to Redis.
- Use PushSource to ingest data to BigQuery.
- Use /get-online-features endpoint.
- Observe intermittent readiness probe failures due to /health occasionally taking up to 10 seconds.
Specifications
- Version: v0.49
- Platform: GKE
- Subsystem: Feast Operator, Redis (Online Store)
Possible Solution
- Investigate whether /push to BigQuery is blocking the main thread, which could delay /health responses.
- Consider enabling multi-threading or separating worker/server logic when deploying via Feast Operator.