Skip to content

Commit 79654fc

Browse files
committed
Add documentation for transformer collocation with runtime
Enhance transformer documentation Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
1 parent d818489 commit 79654fc

File tree

2 files changed

+263
-10
lines changed
  • docs/modelserving/v1beta1/transformer

2 files changed

+263
-10
lines changed

docs/modelserving/v1beta1/transformer/collocation/README.md

+256-10
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,16 @@ KServe by default deploys the Transformer and Predictor as separate services, al
1313
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
1414
3. You can find the [code samples](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/transformer/collocation) on kserve repository.
1515

16-
## Deploy the InferenceService
16+
## Collocation with custom container
17+
### Deploy the InferenceService
1718

1819
Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
1920
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
2021
Deploy the `Inferenceservice` using the below command.
2122

22-
```bash
23+
Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
24+
25+
```yaml
2326
cat <<EOF | kubectl apply -f -
2427
apiVersion: serving.kserve.io/v1beta1
2528
kind: InferenceService
@@ -52,13 +55,20 @@ spec:
5255
image: kserve/image-transformer:latest
5356
args:
5457
- --model_name=mnist
55-
- --protocol=v1 # protocol of the predictor; used for converting the input to specific protocol supported by the predictor
58+
- --predictor_protocol=v1
5659
- --http_port=8080
5760
- --grpc_port=8081
5861
- --predictor_host=localhost:8085 # predictor listening port
62+
- --enable_predictor_health_check
5963
ports:
6064
- containerPort: 8080
6165
protocol: TCP
66+
readinessProbe:
67+
httpGet:
68+
path: /v1/models/mnist
69+
port: 8080
70+
initialDelaySeconds: 5
71+
periodSeconds: 10
6272
resources:
6373
requests:
6474
cpu: 100m
@@ -82,15 +92,15 @@ EOF
8292
predictor. The storage uri should be only present in this container. If it is specified in the transformer
8393
container the isvc creation will fail.
8494

85-
!!! Note
86-
Currently, The collocation support is limited to the custom container spec for kserve model container.
87-
8895
!!! Note
8996
In Serverless mode, Specifying ports for predictor will result in isvc creation failure as specifying multiple ports
9097
is not supported by knative. Due to this limitation predictor cannot be exposed to the outside cluster.
9198
For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
9299

93-
## Check InferenceService status
100+
!!! Tip
101+
Check the [Transformer documentation](../torchserve_image_transformer/#transformer-specific-commandline-arguments) for list of arguments that can be passed to the transformer container.
102+
103+
### Check InferenceService status
94104
```bash
95105
kubectl get isvc custom-transformer-collocation
96106
```
@@ -101,14 +111,13 @@ kubectl get isvc custom-transformer-collocation
101111
```
102112

103113
!!! Note
104-
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. you need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
114+
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. You need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
105115
or [use a custom domain](https://knative.dev/docs/serving/using-a-custom-domain/) in order to expose the `isvc`.
106116

107117
## Run a prediction
108118
Prepare the [inputs](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/transformer/collocation/input.json) for the inference request. Copy the following Json into a file named `input.json`.
109119

110-
Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports
111-
) and set `INGRESS_HOST` and `INGRESS_PORT`
120+
Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
112121

113122
```bash
114123
SERVICE_NAME=custom-transformer-collocation
@@ -143,3 +152,240 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $I
143152
* Connection #0 to host localhost left intact
144153
{"predictions":[2]}
145154
```
155+
156+
## Collocation with Runtime
157+
### Deploy the InferenceService
158+
159+
Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
160+
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
161+
Deploy the `Inferenceservice` using the below command.
162+
163+
Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
164+
165+
```yaml
166+
cat <<EOF | kubectl apply -f -
167+
apiVersion: serving.kserve.io/v1beta1
168+
kind: InferenceService
169+
metadata:
170+
name: transformer-collocation
171+
spec:
172+
predictor:
173+
model:
174+
modelFormat:
175+
name: pytorch
176+
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
177+
resources:
178+
requests:
179+
cpu: 100m
180+
memory: 256Mi
181+
limits:
182+
cpu: 1
183+
memory: 1Gi
184+
containers:
185+
- name: transformer-container # Do not change the container name
186+
image: kserve/image-transformer:latest
187+
args:
188+
- --model_name=mnist
189+
- --predictor_protocol=v1
190+
- --http_port=8080
191+
- --grpc_port=8081
192+
- --predictor_host=localhost:8085 # predictor listening port
193+
- --enable_predictor_health_check # transformer checks for predictor health before marking itself as ready
194+
ports:
195+
- containerPort: 8080
196+
protocol: TCP
197+
readinessProbe:
198+
httpGet:
199+
path: /v1/models/mnist
200+
port: 8080
201+
initialDelaySeconds: 5
202+
periodSeconds: 10
203+
resources:
204+
requests:
205+
cpu: 100m
206+
memory: 256Mi
207+
limits:
208+
cpu: 1
209+
memory: 1Gi
210+
EOF
211+
```
212+
213+
!!! success "Expected output"
214+
```{ .bash .no-copy }
215+
$ inferenceservice.serving.kserve.io/transformer-collocation created
216+
```
217+
218+
### Check InferenceService status
219+
```bash
220+
kubectl get isvc custom-transformer-collocation
221+
```
222+
!!! success "Expected output"
223+
```{ .bash .no-copy }
224+
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
225+
transformer-collocation http://transformer-collocation.default.example.com True 100 transformer-collocation-predictor-00001 133m
226+
```
227+
228+
!!! Note
229+
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. You need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
230+
or [use a custom domain](https://knative.dev/docs/serving/using-a-custom-domain/) in order to expose the `isvc`.
231+
232+
### Run a prediction
233+
Prepare the [inputs](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/transformer/collocation/input.json) for the inference request. Copy the following Json into a file named `input.json`.
234+
235+
Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
236+
237+
```bash
238+
SERVICE_NAME=transformer-collocation
239+
MODEL_NAME=mnist
240+
INPUT_PATH=@./input.json
241+
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
242+
```
243+
You can use `curl` to send the inference request as:
244+
```bash
245+
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
246+
```
247+
248+
!!! success "Expected output"
249+
```{ .bash .no-copy }
250+
* Trying 127.0.0.1:8080...
251+
* Connected to localhost (127.0.0.1) port 8080 (#0)
252+
> POST /v1/models/mnist:predict HTTP/1.1
253+
> Host: transformer-collocation.default.example.com
254+
> User-Agent: curl/7.85.0
255+
> Accept: */*
256+
> Content-Type: application/json
257+
> Content-Length: 427
258+
>
259+
* Mark bundle as not supporting multiuse
260+
< HTTP/1.1 200 OK
261+
< content-length: 19
262+
< content-type: application/json
263+
< date: Sat, 02 Dec 2023 09:13:16 GMT
264+
< server: istio-envoy
265+
< x-envoy-upstream-service-time: 315
266+
<
267+
* Connection #0 to host localhost left intact
268+
{"predictions":[2]}
269+
```
270+
271+
272+
## Defining Collocation In ServingRuntime
273+
274+
You can also define the collocation in the `ServingRuntime` and use it in the `InferenceService`. This is useful when you want to use the same transformer for multiple models.
275+
276+
### Create ServingRuntime
277+
278+
```yaml
279+
cat <<EOF | kubectl apply -f -
280+
apiVersion: serving.kserve.io/v1alpha1
281+
kind: ServingRuntime
282+
metadata:
283+
name: pytorch-collocation
284+
spec:
285+
annotations:
286+
prometheus.kserve.io/port: "8080"
287+
prometheus.kserve.io/path: "/metrics"
288+
supportedModelFormats:
289+
- name: pytorch
290+
version: "1"
291+
autoSelect: true
292+
priority: 1
293+
protocolVersions:
294+
- v1
295+
containers:
296+
- name: kserve-container
297+
image: pytorch/torchserve:0.9.0-cpu
298+
args:
299+
- torchserve
300+
- --start
301+
- --model-store=/mnt/models/model-store
302+
- --ts-config=/mnt/models/config/config.properties
303+
env:
304+
- name: "TS_SERVICE_ENVELOPE"
305+
value: "{% raw %}{{.Labels.serviceEnvelope}}{% endraw %}"
306+
securityContext:
307+
runAsUser: 1000 # User ID is not defined in the Dockerfile, so we need to set it here to run as non-root
308+
allowPrivilegeEscalation: false
309+
privileged: false
310+
runAsNonRoot: true
311+
capabilities:
312+
drop:
313+
- ALL
314+
resources:
315+
requests:
316+
cpu: "1"
317+
memory: 2Gi
318+
limits:
319+
cpu: "1"
320+
memory: 2Gi
321+
322+
- name: transformer-container # Do not change the container name
323+
image: kserve/image-transformer:latest
324+
args:
325+
- --model_name={% raw %}{{.Labels.modelName}}{% endraw %}
326+
- --predictor_protocol=v1
327+
- --http_port=8080
328+
- --grpc_port=8081
329+
- --predictor_host=localhost:8085 # predictor listening port
330+
- --enable_predictor_health_check # transformer checks for predictor health before marking itself as ready
331+
ports:
332+
- containerPort: 8080
333+
protocol: TCP
334+
readinessProbe:
335+
httpGet:
336+
path: /v1/models/{% raw %}{{.Labels.modelName}}{% endraw %}
337+
port: 8080
338+
initialDelaySeconds: 5
339+
periodSeconds: 10
340+
resources:
341+
requests:
342+
cpu: 100m
343+
memory: 256Mi
344+
limits:
345+
cpu: 1
346+
memory: 1Gi
347+
EOF
348+
```
349+
350+
!!! note
351+
Do not specify ports for predictor in the serving runtime for Serverless deployment. This is not supported by knative.
352+
For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
353+
354+
!!! success "Expected output"
355+
```{ .bash .no-copy }
356+
$ servingruntime.serving.kserve.io/pytorch-collocation created
357+
```
358+
### Create InferenceService
359+
360+
```yaml
361+
cat <<EOF | kubectl apply -f -
362+
apiVersion: serving.kserve.io/v1beta1
363+
kind: InferenceService
364+
metadata:
365+
name: transformer-collocation-runtime
366+
labels:
367+
modelName: mnist
368+
spec:
369+
predictor:
370+
model:
371+
modelFormat:
372+
name: pytorch
373+
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
374+
runtime: pytorch-collocation
375+
containers:
376+
- name: transformer-container # Do not change the container name
377+
image: kserve/image-transformer:latest
378+
resources: # You can override the serving runtime values
379+
requests:
380+
cpu: 200m
381+
memory: 512Mi
382+
limits:
383+
cpu: 1
384+
memory: 1Gi
385+
EOF
386+
```
387+
388+
!!! success "Expected output"
389+
```{ .bash .no-copy }
390+
$ inferenceservice.serving.kserve.io/transformer-collocation-runtime created
391+
```

docs/modelserving/v1beta1/transformer/torchserve_image_transformer/README.md

+7
Original file line numberDiff line numberDiff line change
@@ -363,3 +363,10 @@ time serializing and deserializing `3*32*32` shape tensor and with gRPC it is tr
363363
# from gPPC v2 predictor log
364364
2023-01-09 07:27:52.171 79711 root INFO [__call__():128] requestId: , preprocess_ms: 0.067949295, explain_ms: 0, predict_ms: 51.237106323, postprocess_ms: 0.049114227
365365
```
366+
367+
## Transformer Specific Commandline Arguments
368+
- `--predictor_protocol`: The protocol used to communicate with the predictor. The available values are "v1", "v2" and "grpc-v2". The default value is "v1".
369+
- `--predictor_use_ssl`: Whether to use secure SSL when communicating with the predictor. The default value is "false".
370+
- `--predictor_request_timeout_seconds`: The timeout seconds for the request sent to the predictor. The default value is 600 seconds.
371+
- `--predictor_request_retries`: The number of retries for the request sent to the predictor. The default value is 0.
372+
- `--enable_predictor_health_check`: The Transformer will perform readiness check for the predictor in addition to its health check. By default, it is disabled.

0 commit comments

Comments
 (0)