Skip to content

Readiness probe failures when using PushSource to BigQuery – /health responds slowly under low traffic #5494

Open
@HJ29

Description

@HJ29

Expected Behavior

Feast server deployed via Feast Operator should consistently pass readiness probe checks and respond promptly on the /health endpoint, especially under light traffic.

Current Behavior

The readiness probe frequently fails. Upon manual testing (kubectl exec into the pod and curl /health), the response is usually fast but occasionally takes up to 10 seconds, leading to intermittent probe failures.

Feast is deployed on GKE via feast-operator (v0.49). The online store is configured to use Redis. Traffic is very low (estimated < 60 requests/minute), and CPU/memory usage appears stable for both Feast and Redis pods.

We were using /push to a BigQuery PushSource, but stopped doing so few weeks ago. Since then, readiness probe failures have completely stopped.

This suggests that the /push operation may be blocking the main thread and causing /health to respond slowly.

Image

feast-log-1.txt

Steps to reproduce

  1. Deploy Feast v0.49 via feast-operator on GKE.
  2. Configure online store to Redis.
  3. Use PushSource to ingest data to BigQuery.
  4. Use /get-online-features endpoint.
  5. Observe intermittent readiness probe failures due to /health occasionally taking up to 10 seconds.

Specifications

  • Version: v0.49
  • Platform: GKE
  • Subsystem: Feast Operator, Redis (Online Store)

Possible Solution

  • Investigate whether /push to BigQuery is blocking the main thread, which could delay /health responses.
  • Consider enabling multi-threading or separating worker/server logic when deploying via Feast Operator.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions