Skip to content

With the resourceUtil strategy, traffic transfer does not occur during an update when deployment hits quota limit #203

Closed as not planned
@AyushSawant18588

Description

@AyushSawant18588

In a K8s cluster with 1 GPU
Initially, we apply isvc with replicas as 1

NAME                                                      READY   STATUS    RESTARTS   AGE
deploy1-predictor-00001-deployment-868df87c79-6k4sx   2/2     Running   0          79s

Then we update isvc replicas to 2.

With the one update, we notice 2 revisions.
Initial State - Revision 1
After Update: Revision 2

Transitions noticed:

  • All pods of Revision 1 terminated instantly, but traffic remained to be directed to Revision 1.
  • One pod of revision 2 starts to run as only one GPU is available while another pod is pending. The traffic is still directed to revision 1.
NAME                                                      READY   STATUS    RESTARTS   AGE
deploy1-predictor-00002-deployment-5b7d9c4f7-4fwz8    1/2     Running   0          21s
deploy1-predictor-00002-deployment-5b7d9c4f7-w9slk   0/2     Pending   0               21s

Traffic:
        Latest Revision:  false
        Percent:          0
        Revision Name:    deploy1-predictor-00001
        Latest Revision:  true
        Percent:          100
        Revision Name:    deploy1-predictor-00001
  • If in this state (before all containers in one pod of revision 2 are running), an inference request is sent. This triggers Knative to spawn a pod in Revision 1 due to the route.
NAME                                                      READY   STATUS    RESTARTS        AGE
deploy1-predictor-00001-deployment-f4cd9f5c4-g6g7h    0/2     Pending   0               3m21s
deploy1-predictor-00002-deployment-587c5d876c-w9slk   0/2     Pending   0               4m5s
deploy1-predictor-00002-deployment-587c5d876c-4fwz8   2/2     Running   4 (2m24s ago)   4m6s

Traffic:
        Latest Revision:  false
        Percent:          0
        Revision Name:    deploy1-predictor-00001
        Latest Revision:  true
        Percent:          100
        Revision Name:    deploy1-predictor-00001
  • Since the GPU is used by revision 2 pod revision 1 is stuck in a pending state.
  • At this stage, even though the GPU is being used and one revision 2 pod is in the running state, the inference requests fail, and this stage does not correct itself.

This behavior happens during an update for an isvc that has replicas as 1 and is deployed in a cluster with no extra resources.

Is this the expected behavior for the scenario mentioned above?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions