With the resourceUtil strategy, traffic transfer does not occur during an update when deployment hits quota limit

In a K8s cluster with 1 GPU
Initially, we apply isvc with replicas as 1
```
NAME                                                      READY   STATUS    RESTARTS   AGE
deploy1-predictor-00001-deployment-868df87c79-6k4sx   2/2     Running   0          79s
```
Then we update isvc replicas to 2. 

With the one update, we notice 2 revisions.
Initial State - Revision 1
After Update: Revision 2

Transitions noticed:

- All pods of Revision 1 terminated instantly, but traffic remained to be directed to Revision 1.
- One pod of revision 2 starts to run as only one GPU is available while another pod is pending. The traffic is still directed to revision 1.

```
NAME                                                      READY   STATUS    RESTARTS   AGE
deploy1-predictor-00002-deployment-5b7d9c4f7-4fwz8    1/2     Running   0          21s
deploy1-predictor-00002-deployment-5b7d9c4f7-w9slk   0/2     Pending   0               21s

Traffic:
        Latest Revision:  false
        Percent:          0
        Revision Name:    deploy1-predictor-00001
        Latest Revision:  true
        Percent:          100
        Revision Name:    deploy1-predictor-00001
```

- If in this state (before all containers in one pod of revision 2 are running), an inference request is sent. This triggers Knative to spawn a pod in Revision 1 due to the route.
```
NAME                                                      READY   STATUS    RESTARTS        AGE
deploy1-predictor-00001-deployment-f4cd9f5c4-g6g7h    0/2     Pending   0               3m21s
deploy1-predictor-00002-deployment-587c5d876c-w9slk   0/2     Pending   0               4m5s
deploy1-predictor-00002-deployment-587c5d876c-4fwz8   2/2     Running   4 (2m24s ago)   4m6s

Traffic:
        Latest Revision:  false
        Percent:          0
        Revision Name:    deploy1-predictor-00001
        Latest Revision:  true
        Percent:          100
        Revision Name:    deploy1-predictor-00001
```

- Since the GPU is used by revision 2 pod revision 1 is stuck in a pending state.
- At this stage, even though the GPU is being used and one revision 2 pod is in the running state, the inference requests fail, and this stage does not correct itself.

This behavior happens during an update for an isvc that has replicas as 1 and is deployed in a cluster with no extra resources.

Is this the expected behavior for the scenario mentioned above?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

With the resourceUtil strategy, traffic transfer does not occur during an update when deployment hits quota limit #203

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

With the resourceUtil strategy, traffic transfer does not occur during an update when deployment hits quota limit #203

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions