Closed as not planned
Description
In a K8s cluster with 1 GPU
Initially, we apply isvc with replicas as 1
NAME READY STATUS RESTARTS AGE
deploy1-predictor-00001-deployment-868df87c79-6k4sx 2/2 Running 0 79s
Then we update isvc replicas to 2.
With the one update, we notice 2 revisions.
Initial State - Revision 1
After Update: Revision 2
Transitions noticed:
- All pods of Revision 1 terminated instantly, but traffic remained to be directed to Revision 1.
- One pod of revision 2 starts to run as only one GPU is available while another pod is pending. The traffic is still directed to revision 1.
NAME READY STATUS RESTARTS AGE
deploy1-predictor-00002-deployment-5b7d9c4f7-4fwz8 1/2 Running 0 21s
deploy1-predictor-00002-deployment-5b7d9c4f7-w9slk 0/2 Pending 0 21s
Traffic:
Latest Revision: false
Percent: 0
Revision Name: deploy1-predictor-00001
Latest Revision: true
Percent: 100
Revision Name: deploy1-predictor-00001
- If in this state (before all containers in one pod of revision 2 are running), an inference request is sent. This triggers Knative to spawn a pod in Revision 1 due to the route.
NAME READY STATUS RESTARTS AGE
deploy1-predictor-00001-deployment-f4cd9f5c4-g6g7h 0/2 Pending 0 3m21s
deploy1-predictor-00002-deployment-587c5d876c-w9slk 0/2 Pending 0 4m5s
deploy1-predictor-00002-deployment-587c5d876c-4fwz8 2/2 Running 4 (2m24s ago) 4m6s
Traffic:
Latest Revision: false
Percent: 0
Revision Name: deploy1-predictor-00001
Latest Revision: true
Percent: 100
Revision Name: deploy1-predictor-00001
- Since the GPU is used by revision 2 pod revision 1 is stuck in a pending state.
- At this stage, even though the GPU is being used and one revision 2 pod is in the running state, the inference requests fail, and this stage does not correct itself.
This behavior happens during an update for an isvc that has replicas as 1 and is deployed in a cluster with no extra resources.
Is this the expected behavior for the scenario mentioned above?