Add documentation for transformer collocation with runtime #464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

sivanantha321 wants to merge 1 commit into kserve:main from sivanantha321:collocation-runtime-docs

+263 −10

Member

sivanantha321 commented Apr 9, 2025

Enhance transformer documentation

"Fixes kserve/kserve#4343"

Proposed Changes

Documents Collocation transformer and predictor spec kserve#4255
Documents Transformer specific command line arguments

netlify bot commented Apr 9, 2025 •

edited

Loading

❌ Deploy Preview for elastic-nobel-0aef7a failed.

Name	Link
🔨 Latest commit	`79654fc`
🔍 Latest deploy log	https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/67f63465229208000864ccf8


          Add documentation for transformer collocation with runtime

79654fc

Enhance transformer documentation

Signed-off-by: Sivanantham Chinnaiyan <[email protected]>

sivanantha321 force-pushed the collocation-runtime-docs branch from 59c24d5 to 79654fc Compare

April 9, 2025 08:48

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

               Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
               while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
               Deploy the `Inferenceservice` using the below command.
-              ```bash
+              Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.

Contributor

spolti Apr 11, 2025

Suggested change

      
            Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.
          
            Note that, readiness probe is specified in the transformer container. This is due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

    
            @@ -52,13 +55,20 @@ spec:
          
                      image: kserve/image-transformer:latest

                      args:

                        - --model_name=mnist

                        - --protocol=v1    # protocol of the predictor; used for converting the input to specific protocol supported by the predictor

                        - --predictor_protocol=v1

Contributor

spolti Apr 11, 2025

May we keep the comment or use v2?

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

                       ports:
                         - containerPort: 8080
                           protocol: TCP
+                      readinessProbe:

Contributor

spolti Apr 11, 2025

What about adding liveness too?

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

+              ### Deploy the InferenceService
+              Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
+              while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.

Contributor

spolti Apr 11, 2025

Maybe also mention which is the default port for gRPC for predictors.

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

+              Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8080 (REST) and 8081 (GRPC)
+              while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
+              Deploy the `Inferenceservice` using the below command.

Contributor

spolti Apr 11, 2025

Suggested change

      
            Deploy the `Inferenceservice` using the below command.
          
            Deploy the `Inferenceservice` using the following command:

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

+              while, `Predictor` listens on port 8085 (REST). `Transformer` calls `Predictor` on port 8085 via local socket.
+              Deploy the `Inferenceservice` using the below command.
+              Note that, readiness probe is specified in the transformer container. This due to the limitation of Knative. You can provide `--enable_predictor_health_check` argument to allow the transformer container to check the predictor health as well. This will make sure that both the containers are healthy before the isvc is marked as ready.

Contributor

spolti Apr 11, 2025

duplicated note?

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

+              !!! note
+                  Do not specify ports for predictor in the serving runtime for Serverless deployment. This is not supported by knative.
+                  For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).

Contributor

spolti Apr 11, 2025

Suggested change

      
                For more info see, [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).
          
                For more information, please take a look at [knative discussion on multiple ports](https://github.com/knative/serving/issues/8471).

spolti reviewed

View reviewed changes

docs/modelserving/v1beta1/transformer/collocation/README.md

+                      - name: "TS_SERVICE_ENVELOPE"
+                        value: "{% raw %}{{.Labels.serviceEnvelope}}{% endraw %}"
+                    securityContext:
+                      runAsUser: 1000    # User ID is not defined in the Dockerfile, so we need to set it here to run as non-root

Contributor

spolti Apr 11, 2025

I think k8s adds it by default, ideally, we need to run a random uid, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet