With the resourceUtil strategy, graceful traffic transfer does not occur during consecutive update requests.

In a K8s cluster with 2 GPUs
Initially, we apply isvc with replicas as 1

```
NAME                                                READY   STATUS    RESTARTS   AGE
test1-predictor-00001-deployment-775867dd96-44g92   2/2     Running   0          5m14s
```
Initial State
Replicas: 1

1st Update: replicas 1 -> 2 
Immediately make 2nd Update: 2-> 1

With the 2 consecutive updates, we notice 3 revisions.
Initial State - Revision 1
After 1st Update: Revision 2
After 2nd Update:  Revision 3

Transitions noticed:

- All pods of Revision 1 terminated instantly, but traffic remained to be directed to Revision 1
- Revision 2 and Revision 3 start to deploy. The revision 2 pods get preference over resources; they get the 2 GPUs, and they start to run. The revision 3 pod is in a pending state. The traffic is still directed to revision 1.

```
NAME                                                READY   STATUS    RESTARTS   AGE
test1-predictor-00002-deployment-79cd96bd78-j44gp   1/2     Running   0          5s
test1-predictor-00002-deployment-79cd96bd78-ttxrj   1/2     Running   0          4s
test1-predictor-00003-deployment-7dd4575f4-ppsv8    0/2     Pending   0          1s

Traffic info:
Traffic:
        Latest Revision:  true
        Percent:          100
        Revision Name:    test1-predictor-00001
```
- In the above state, if there were any inference requests, this triggers Knative to spawn a pod in Revision 1 due to the route

```
Traffic:
        Latest Revision:  true
        Percent:          100
        Revision Name:    test1-predictor-00001

NAME                                                READY   STATUS    RESTARTS      AGE
test1-predictor-00001-deployment-6c7d788877-phff5   0/2     Pending   0             2m16s
test1-predictor-00002-deployment-6f7f4f68c5-2chrb   2/2     Running   0             3m4s
test1-predictor-00002-deployment-6f7f4f68c5-8hvl5   2/2     Running   4 (82s ago)   3m4s
test1-predictor-00003-deployment-76f5959878-td4rl   0/2     Pending   0             3m
```
- Since both GPUs are used by revision 2 pods revision 1 is stuck in a pending state.
- At this stage, even though both GPUs are being used and revision 2 pods are in the running state, the inference requests fail, and this stage does not correct itself.

Expectation
On consecutive updates, graceful transfers between routes need to occur. The final pod in the older revision should be terminated only when the 1 pod in the new revision is up, and the traffic route has been shifted to the new revision.

Should not get stuck and reach the final state of pods of revision 3 only running
```
NAME                                                READY   STATUS    RESTARTS   AGE
test1-predictor-00003-deployment-76f5959878-td4rl   2/2     Running   0          5m14s
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

With the resourceUtil strategy, graceful traffic transfer does not occur during consecutive update requests. #202

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

With the resourceUtil strategy, graceful traffic transfer does not occur during consecutive update requests. #202

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions