Skip to content

Update README and versions for 2.48.0 / 24.07 #7425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile.sdk
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
#

# Base image on the minimum Triton container
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:24.06-py3-min
ARG BASE_IMAGE=nvcr.io/nvidia/tritonserver:24.07-py3-min

ARG TRITON_CLIENT_REPO_SUBDIR=clientrepo
ARG TRITON_COMMON_REPO_TAG=main
Expand Down
20 changes: 10 additions & 10 deletions Dockerfile.win10.min
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -37,9 +37,9 @@ RUN choco install unzip -y
#
# Installing TensorRT
#
ARG TENSORRT_VERSION=10.0.1.6
ARG TENSORRT_ZIP="TensorRT-${TENSORRT_VERSION}.Windows10.x86_64.cuda-12.4.zip"
ARG TENSORRT_SOURCE=https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.0.1/zip/TensorRT-10.0.1.6.Windows10.win10.cuda-12.4.zip
ARG TENSORRT_VERSION=10.2.0.19
ARG TENSORRT_ZIP="TensorRT-${TENSORRT_VERSION}.Windows10.x86_64.cuda-12.5.zip"
ARG TENSORRT_SOURCE=https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows10.x86_64.cuda-12.5.zip
# COPY ${TENSORRT_ZIP} /tmp/${TENSORRT_ZIP}
ADD ${TENSORRT_SOURCE} /tmp/${TENSORRT_ZIP}
RUN unzip /tmp/%TENSORRT_ZIP%
Expand All @@ -51,9 +51,9 @@ LABEL TENSORRT_VERSION="${TENSORRT_VERSION}"
#
# Installing cuDNN
#
ARG CUDNN_VERSION=9.1.0.70
ARG CUDNN_VERSION=9.2.1.18
ARG CUDNN_ZIP=cudnn-windows-x86_64-${CUDNN_VERSION}_cuda12-archive.zip
ARG CUDNN_SOURCE=https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/windows-x86_64/cudnn-windows-x86_64-9.1.0.70_cuda12-archive.zip
ARG CUDNN_SOURCE=https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/windows-x86_64/cudnn-windows-x86_64-9.2.1.18_cuda12-archive.zip
ADD ${CUDNN_SOURCE} /tmp/${CUDNN_ZIP}
RUN unzip /tmp/%CUDNN_ZIP%
RUN move cudnn-* cudnn
Expand Down Expand Up @@ -125,7 +125,7 @@ WORKDIR /
#
# Installing Vcpkg
#
ARG VCPGK_VERSION=2024.03.19
ARG VCPGK_VERSION=2024.06.15
RUN git clone --single-branch --depth=1 -b %VCPGK_VERSION% https://github.com/microsoft/vcpkg.git
WORKDIR /vcpkg
RUN bootstrap-vcpkg.bat
Expand All @@ -150,7 +150,7 @@ WORKDIR /
#
ARG CUDA_MAJOR=12
ARG CUDA_MINOR=5
ARG CUDA_PATCH=0
ARG CUDA_PATCH=1
ARG CUDA_VERSION=${CUDA_MAJOR}.${CUDA_MINOR}.${CUDA_PATCH}
ARG CUDA_PACKAGES="nvcc_${CUDA_MAJOR}.${CUDA_MINOR} \
cudart_${CUDA_MAJOR}.${CUDA_MINOR} \
Expand All @@ -175,15 +175,15 @@ RUN copy "%CUDA_INSTALL_ROOT_WP%\extras\visual_studio_integration\MSBuildExtensi

RUN setx PATH "%CUDA_INSTALL_ROOT_WP%\bin;%PATH%"

ARG CUDNN_VERSION=9.1.0.70
ARG CUDNN_VERSION=9.2.1.18
ENV CUDNN_VERSION ${CUDNN_VERSION}
COPY --from=dependency_base /cudnn /cudnn
RUN copy cudnn\bin\cudnn*.dll "%CUDA_INSTALL_ROOT_WP%\bin\."
RUN copy cudnn\lib\x64\cudnn*.lib "%CUDA_INSTALL_ROOT_WP%\lib\x64\."
RUN copy cudnn\include\cudnn*.h "%CUDA_INSTALL_ROOT_WP%\include\."
LABEL CUDNN_VERSION="${CUDNN_VERSION}"

ARG TENSORRT_VERSION=10.0.1.6
ARG TENSORRT_VERSION=10.2.0.19
ENV TRT_VERSION ${TENSORRT_VERSION}
COPY --from=dependency_base /TensorRT /TensorRT
RUN setx PATH "c:\TensorRT\lib;%PATH%"
Expand Down
9 changes: 2 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -30,11 +30,6 @@

[![License](https://img.shields.io/badge/License-BSD3-lightgrey.svg)](https://opensource.org/licenses/BSD-3-Clause)

> [!WARNING]
> ##### LATEST RELEASE
> You are currently on the `main` branch which tracks under-development progress towards the next release.
> The current release is version [2.47.0](https://github.com/triton-inference-server/server/releases/latest) and corresponds to the 24.06 container release on NVIDIA GPU Cloud (NGC).

Triton Inference Server is an open source inference serving software that
streamlines AI inferencing. Triton enables teams to deploy any AI model from
multiple deep learning and machine learning frameworks, including TensorRT,
Expand Down Expand Up @@ -261,4 +256,4 @@ For questions, we recommend posting in our community
## For more information

Please refer to the [NVIDIA Developer Triton page](https://developer.nvidia.com/nvidia-triton-inference-server)
for more information.
for more information.
2 changes: 1 addition & 1 deletion TRITON_VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.48.0dev
2.48.0
8 changes: 4 additions & 4 deletions build.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,14 @@
# incorrectly load the other version of the openvino libraries.
#
TRITON_VERSION_MAP = {
"2.48.0dev": (
"24.06dev", # triton container
"24.06", # upstream container
"2.48.0": (
"24.07", # triton container
"24.07", # upstream container
"1.18.1", # ORT
"2024.0.0", # ORT OpenVINO
"2024.0.0", # Standalone OpenVINO
"3.2.6", # DCGM version
"0.5.0.post1", # vLLM version
"0.5.1", # vLLM version
)
}

Expand Down
4 changes: 2 additions & 2 deletions deploy/aws/values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,7 +27,7 @@
replicaCount: 1

image:
imageName: nvcr.io/nvidia/tritonserver:24.06-py3
imageName: nvcr.io/nvidia/tritonserver:24.07-py3
pullPolicy: IfNotPresent
modelRepositoryPath: s3://triton-inference-server-repository/model_repository
numGpus: 1
Expand Down
4 changes: 2 additions & 2 deletions deploy/fleetcommand/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -26,7 +26,7 @@

apiVersion: v1
# appVersion is the Triton version; update when changing release
appVersion: "2.47.0"
appVersion: "2.48.0"
description: Triton Inference Server (Fleet Command)
name: triton-inference-server
# version is the Chart version; update when changing anything in the chart
Expand Down
8 changes: 4 additions & 4 deletions deploy/fleetcommand/values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,7 +27,7 @@
replicaCount: 1

image:
imageName: nvcr.io/nvidia/tritonserver:24.06-py3
imageName: nvcr.io/nvidia/tritonserver:24.07-py3
pullPolicy: IfNotPresent
numGpus: 1
serverCommand: tritonserver
Expand All @@ -47,13 +47,13 @@ image:
#
# To set model control mode, uncomment and configure below
# TODO: Fix the following url, it is invalid
# See https://github.com/triton-inference-server/server/blob/r24.06/docs/model_management.md
# See https://github.com/triton-inference-server/server/blob/r24.07/docs/model_management.md
# for more details
#- --model-control-mode=explicit|poll|none
#
# Additional server args
#
# see https://github.com/triton-inference-server/server/blob/r24.06/README.md
# see https://github.com/triton-inference-server/server/blob/r24.07/README.md
# for more details

service:
Expand Down
4 changes: 2 additions & 2 deletions deploy/gcp/values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,7 +27,7 @@
replicaCount: 1

image:
imageName: nvcr.io/nvidia/tritonserver:24.06-py3
imageName: nvcr.io/nvidia/tritonserver:24.07-py3
pullPolicy: IfNotPresent
modelRepositoryPath: gs://triton-inference-server-repository/model_repository
numGpus: 1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -33,7 +33,7 @@ metadata:
namespace: default
spec:
containers:
- image: nvcr.io/nvidia/tritonserver:24.06-py3-sdk
- image: nvcr.io/nvidia/tritonserver:24.07-py3-sdk
imagePullPolicy: Always
name: nv-triton-client
securityContext:
Expand Down
8 changes: 4 additions & 4 deletions deploy/gke-marketplace-app/server-deployer/build_and_push.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,9 +27,9 @@

export REGISTRY=gcr.io/$(gcloud config get-value project | tr ':' '/')
export APP_NAME=tritonserver
export MAJOR_VERSION=2.45
export MINOR_VERSION=2.45.0
export NGC_VERSION=24.06-py3
export MAJOR_VERSION=2.48
export MINOR_VERSION=2.48.0
export NGC_VERSION=24.07-py3

docker pull nvcr.io/nvidia/$APP_NAME:$NGC_VERSION

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand All @@ -25,7 +25,7 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

apiVersion: v1
appVersion: "2.47"
appVersion: "2.48"
description: Triton Inference Server
name: triton-inference-server
version: 2.47.0
version: 2.48.0
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -31,14 +31,14 @@ maxReplicaCount: 3
tritonProtocol: HTTP
# HPA GPU utilization autoscaling target
HPATargetAverageValue: 85
modelRepositoryPath: gs://triton_sample_models/24.06
publishedVersion: '2.47.0'
modelRepositoryPath: gs://triton_sample_models/24.07
publishedVersion: '2.48.0'
gcpMarketplace: true

image:
registry: gcr.io
repository: nvidia-ngc-public/tritonserver
tag: 24.06-py3
tag: 24.07-py3
pullPolicy: IfNotPresent
# modify the model repository here to match your GCP storage bucket
numGpus: 1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,7 +27,7 @@
x-google-marketplace:
schemaVersion: v2
applicationApiVersion: v1beta1
publishedVersion: '2.47.0'
publishedVersion: '2.48.0'
publishedVersionMetadata:
releaseNote: >-
Initial release.
Expand Down
6 changes: 3 additions & 3 deletions deploy/gke-marketplace-app/server-deployer/schema.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,7 +27,7 @@
x-google-marketplace:
schemaVersion: v2
applicationApiVersion: v1beta1
publishedVersion: '2.47.0'
publishedVersion: '2.48.0'
publishedVersionMetadata:
releaseNote: >-
Initial release.
Expand Down Expand Up @@ -89,7 +89,7 @@ properties:
modelRepositoryPath:
type: string
title: Bucket where models are stored. Please make sure the user/service account to create the GKE app has permission to this GCS bucket. Read Triton documentation on configs and formatting details, supporting TensorRT, TensorFlow, Pytorch, Onnx ... etc.
default: gs://triton_sample_models/24.06
default: gs://triton_sample_models/24.07
image.ldPreloadPath:
type: string
title: Leave this empty by default. Triton allows users to create custom layers for backend such as TensorRT plugin or Tensorflow custom ops, the compiled shared library must be provided via LD_PRELOAD environment variable.
Expand Down
8 changes: 4 additions & 4 deletions deploy/gke-marketplace-app/trt-engine/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright (c) 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -33,7 +33,7 @@
```
docker run --gpus all -it --network host \
--shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \
-v ~:/scripts nvcr.io/nvidia/tensorrt:24.06-py3
-v ~:/scripts nvcr.io/nvidia/tensorrt:24.07-py3

pip install onnx six torch tf2onnx tensorflow

Expand All @@ -57,7 +57,7 @@ mkdir -p engines

python3 builder.py -m models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/model.ckpt -o engines/bert_large_int8_bs1_s128.engine -b 1 -s 128 -c models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/ -v models/fine-tuned/bert_tf_ckpt_large_qa_squad2_amp_128_v19.03.1/vocab.txt --int8 --fp16 --strict --calib-num 1 -iln -imh

gsutil cp bert_large_int8_bs1_s128.engine gs://triton_sample_models/24.06/bert/1/model.plan
gsutil cp bert_large_int8_bs1_s128.engine gs://triton_sample_models/24.07/bert/1/model.plan
```

For each Triton upgrade, container version used to generate the model, and the model path in GCS `gs://triton_sample_models/24.06/` should be updated accordingly with the correct version.
For each Triton upgrade, container version used to generate the model, and the model path in GCS `gs://triton_sample_models/24.07/` should be updated accordingly with the correct version.
4 changes: 2 additions & 2 deletions deploy/k8s-onprem/values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -29,7 +29,7 @@ tags:
loadBalancing: true

image:
imageName: nvcr.io/nvidia/tritonserver:24.06-py3
imageName: nvcr.io/nvidia/tritonserver:24.07-py3
pullPolicy: IfNotPresent
modelRepositoryServer: < Replace with the IP Address of your file server >
modelRepositoryPath: /srv/models
Expand Down
4 changes: 2 additions & 2 deletions deploy/oci/values.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2019-2024, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -27,7 +27,7 @@
replicaCount: 1

image:
imageName: nvcr.io/nvidia/tritonserver:24.06-py3
imageName: nvcr.io/nvidia/tritonserver:24.07-py3
pullPolicy: IfNotPresent
modelRepositoryPath: s3://https://<OCI_NAMESPACE>.compat.objectstorage.<OCI_REGION>.oraclecloud.com:443/triton-inference-server-repository
numGpus: 1
Expand Down
8 changes: 4 additions & 4 deletions docs/customization_guide/build.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -173,7 +173,7 @@ $ ./build.py ... --repo-tag=common:<container tag> --repo-tag=core:<container ta

If you are building on a release branch then `<container tag>` will
default to the branch name. For example, if you are building on the
r24.06 branch, `<container tag>` will default to r24.06. If you are
r24.07 branch, `<container tag>` will default to r24.07. If you are
building on any other branch (including the *main* branch) then
`<container tag>` will default to "main". Therefore, you typically do
not need to provide `<container tag>` at all (nor the preceding
Expand Down Expand Up @@ -334,8 +334,8 @@ python build.py --cmake-dir=<path/to/repo>/build --build-dir=/tmp/citritonbuild
If you are building on *main* branch then '<container tag>' will
default to "main". If you are building on a release branch then
'<container tag>' will default to the branch name. For example, if you
are building on the r24.06 branch, '<container tag>' will default to
r24.06. Therefore, you typically do not need to provide '<container
are building on the r24.07 branch, '<container tag>' will default to
r24.07. Therefore, you typically do not need to provide '<container
tag>' at all (nor the preceding colon). You can use a different
'<container tag>' for a component to instead use the corresponding
branch/tag in the build. For example, if you have a branch called
Expand Down
Loading
Loading