You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/modelserving/v1beta1/transformer/collocation/README.md
+256-10
Original file line number
Diff line number
Diff line change
@@ -13,13 +13,16 @@ KServe by default deploys the Transformer and Predictor as separate services, al
13
13
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
14
14
3. You can find the [code samples](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/transformer/collocation) on kserve repository.
15
15
16
-
## Deploy the InferenceService
16
+
## Collocation with custom container
17
+
### Deploy the InferenceService
17
18
18
19
Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
19
20
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
20
21
Deploy the `Inferenceservice` using the below command.
21
22
22
-
```bash
23
+
Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
24
+
25
+
```yaml
23
26
cat <<EOF | kubectl apply -f -
24
27
apiVersion: serving.kserve.io/v1beta1
25
28
kind: InferenceService
@@ -52,13 +55,20 @@ spec:
52
55
image: kserve/image-transformer:latest
53
56
args:
54
57
- --model_name=mnist
55
-
- --protocol=v1 # protocol of the predictor; used for converting the input to specific protocol supported by the predictor
58
+
- --predictor_protocol=v1
56
59
- --http_port=8080
57
60
- --grpc_port=8081
58
61
- --predictor_host=localhost:8085 # predictor listening port
62
+
- --enable_predictor_health_check
59
63
ports:
60
64
- containerPort: 8080
61
65
protocol: TCP
66
+
readinessProbe:
67
+
httpGet:
68
+
path: /v1/models/mnist
69
+
port: 8080
70
+
initialDelaySeconds: 5
71
+
periodSeconds: 10
62
72
resources:
63
73
requests:
64
74
cpu: 100m
@@ -82,15 +92,15 @@ EOF
82
92
predictor. The storage uri should be only present in this container. If it is specified in the transformer
83
93
container the isvc creation will fail.
84
94
85
-
!!! Note
86
-
Currently, The collocation support is limited to the custom container spec for kserve model container.
87
-
88
95
!!! Note
89
96
In Serverless mode, Specifying ports for predictor will result in isvc creation failure as specifying multiple ports
90
97
is not supported by knative. Due to this limitation predictor cannot be exposed to the outside cluster.
91
98
For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
92
99
93
-
## Check InferenceService status
100
+
!!! Tip
101
+
Check the [Transformer documentation](../torchserve_image_transformer/#transformer-specific-commandline-arguments) for list of arguments that can be passed to the transformer container.
102
+
103
+
### Check InferenceService status
94
104
```bash
95
105
kubectl get isvc custom-transformer-collocation
96
106
```
@@ -101,14 +111,13 @@ kubectl get isvc custom-transformer-collocation
101
111
```
102
112
103
113
!!! Note
104
-
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. you need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
114
+
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. You need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
105
115
or [use a custom domain](https://knative.dev/docs/serving/using-a-custom-domain/) in order to expose the `isvc`.
106
116
107
117
## Run a prediction
108
118
Prepare the [inputs](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/transformer/collocation/input.json) for the inference request. Copy the following Json into a file named `input.json`.
109
119
110
-
Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports
111
-
) and set `INGRESS_HOST` and `INGRESS_PORT`
120
+
Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
160
+
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
161
+
Deploy the `Inferenceservice` using the below command.
162
+
163
+
Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. You need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
230
+
or [use a custom domain](https://knative.dev/docs/serving/using-a-custom-domain/) in order to expose the `isvc`.
231
+
232
+
### Run a prediction
233
+
Prepare the [inputs](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/transformer/collocation/input.json) for the inference request. Copy the following Json into a file named `input.json`.
234
+
235
+
Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
You can also define the collocation in the `ServingRuntime` and use it in the `InferenceService`. This is useful when you want to use the same transformer for multiple models.
-`--predictor_protocol`: The protocol used to communicate with the predictor. The available values are "v1", "v2" and "grpc-v2". The default value is "v1".
369
+
-`--predictor_use_ssl`: Whether to use secure SSL when communicating with the predictor. The default value is "false".
370
+
-`--predictor_request_timeout_seconds`: The timeout seconds for the request sent to the predictor. The default value is 600 seconds.
371
+
-`--predictor_request_retries`: The number of retries for the request sent to the predictor. The default value is 0.
372
+
-`--enable_predictor_health_check`: The Transformer will perform readiness check for the predictor in addition to its health check. By default, it is disabled.
0 commit comments