Skip to content

Add documentation for transformer collocation with runtime #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sivanantha321
Copy link
Member

Enhance transformer documentation

"Fixes kserve/kserve#4343"

Proposed Changes

Copy link

netlify bot commented Apr 9, 2025

Deploy Preview for elastic-nobel-0aef7a failed.

Name Link
🔨 Latest commit 79654fc
🔍 Latest deploy log https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/67f63465229208000864ccf8

Enhance transformer documentation

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>
@sivanantha321 sivanantha321 force-pushed the collocation-runtime-docs branch from 59c24d5 to 79654fc Compare April 9, 2025 08:48

Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
Deploy the `Inferenceservice` using the below command.

```bash
Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
Note that, readiness probe is specified in the transformer container. This is due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.

@@ -52,13 +55,20 @@ spec:
image: kserve/image-transformer:latest
args:
- --model_name=mnist
- --protocol=v1 # protocol of the predictor; used for converting the input to specific protocol supported by the predictor
- --predictor_protocol=v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we keep the comment or use v2?

ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding liveness too?

### Deploy the InferenceService

Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also mention which is the default port for gRPC for predictors.


Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
Deploy the `Inferenceservice` using the below command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Deploy the `Inferenceservice` using the below command.
Deploy the `Inferenceservice` using the following command:

while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
Deploy the `Inferenceservice` using the below command.

Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated note?


!!! note
Do not specify ports for predictor in the serving runtime for Serverless deployment. This is not supported by knative.
For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
For more information, please take a look at [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).

- name: "TS_SERVICE_ENVELOPE"
value: "{% raw %}{{.Labels.serviceEnvelope}}{% endraw %}"
securityContext:
runAsUser: 1000 # User ID is not defined in the Dockerfile, so we need to set it here to run as non-root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think k8s adds it by default, ideally, we need to run a random uid, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preprocessing in runtime? Multiple ports issue
2 participants