|
1 | 1 | <!--
|
2 |
| -# Copyright 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 2 | +# Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
3 | 3 | #
|
4 | 4 | # Redistribution and use in source and binary forms, with or without
|
5 | 5 | # modification, are permitted provided that the following conditions
|
|
31 | 31 | [](https://opensource.org/licenses/BSD-3-Clause)
|
32 | 32 |
|
33 | 33 | > [!WARNING]
|
34 |
| -> ##### LATEST RELEASE |
35 |
| -> You are currently on the `main` branch which tracks under-development progress towards the next release. |
36 |
| -> The current release is version [2.47.0](https://github.com/triton-inference-server/server/releases/latest) and corresponds to the 24.06 container release on NVIDIA GPU Cloud (NGC). |
37 |
| -
|
38 |
| -Triton Inference Server is an open source inference serving software that |
39 |
| -streamlines AI inferencing. Triton enables teams to deploy any AI model from |
40 |
| -multiple deep learning and machine learning frameworks, including TensorRT, |
41 |
| -TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton |
42 |
| -Inference Server supports inference across cloud, data center, edge and embedded |
43 |
| -devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton Inference |
44 |
| -Server delivers optimized performance for many query types, including real time, |
45 |
| -batched, ensembles and audio/video streaming. Triton inference Server is part of |
46 |
| -[NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/), |
47 |
| -a software platform that accelerates the data science pipeline and streamlines |
48 |
| -the development and deployment of production AI. |
49 |
| - |
50 |
| -Major features include: |
51 |
| - |
52 |
| -- [Supports multiple deep learning |
53 |
| - frameworks](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton) |
54 |
| -- [Supports multiple machine learning |
55 |
| - frameworks](https://github.com/triton-inference-server/fil_backend) |
56 |
| -- [Concurrent model |
57 |
| - execution](docs/user_guide/architecture.md#concurrent-model-execution) |
58 |
| -- [Dynamic batching](docs/user_guide/model_configuration.md#dynamic-batcher) |
59 |
| -- [Sequence batching](docs/user_guide/model_configuration.md#sequence-batcher) and |
60 |
| - [implicit state management](docs/user_guide/architecture.md#implicit-state-management) |
61 |
| - for stateful models |
62 |
| -- Provides [Backend API](https://github.com/triton-inference-server/backend) that |
63 |
| - allows adding custom backends and pre/post processing operations |
64 |
| -- Supports writing custom backends in python, a.k.a. |
65 |
| - [Python-based backends.](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md#python-based-backends) |
66 |
| -- Model pipelines using |
67 |
| - [Ensembling](docs/user_guide/architecture.md#ensemble-models) or [Business |
68 |
| - Logic Scripting |
69 |
| - (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting) |
70 |
| -- [HTTP/REST and GRPC inference |
71 |
| - protocols](docs/customization_guide/inference_protocols.md) based on the community |
72 |
| - developed [KServe |
73 |
| - protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) |
74 |
| -- A [C API](docs/customization_guide/inference_protocols.md#in-process-triton-server-api) and |
75 |
| - [Java API](docs/customization_guide/inference_protocols.md#java-bindings-for-in-process-triton-server-api) |
76 |
| - allow Triton to link directly into your application for edge and other in-process use cases |
77 |
| -- [Metrics](docs/user_guide/metrics.md) indicating GPU utilization, server |
78 |
| - throughput, server latency, and more |
79 |
| - |
80 |
| -**New to Triton Inference Server?** Make use of |
81 |
| -[these tutorials](https://github.com/triton-inference-server/tutorials) |
82 |
| -to begin your Triton journey! |
83 |
| - |
84 |
| -Join the [Triton and TensorRT community](https://www.nvidia.com/en-us/deep-learning-ai/triton-tensorrt-newsletter/) and |
85 |
| -stay current on the latest product updates, bug fixes, content, best practices, |
86 |
| -and more. Need enterprise support? NVIDIA global support is available for Triton |
87 |
| -Inference Server with the |
88 |
| -[NVIDIA AI Enterprise software suite](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/). |
89 |
| - |
90 |
| -## Serve a Model in 3 Easy Steps |
91 |
| - |
92 |
| -```bash |
93 |
| -# Step 1: Create the example model repository |
94 |
| -git clone -b r24.06 https://github.com/triton-inference-server/server.git |
95 |
| -cd server/docs/examples |
96 |
| -./fetch_models.sh |
97 |
| - |
98 |
| -# Step 2: Launch triton from the NGC Triton container |
99 |
| -docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.06-py3 tritonserver --model-repository=/models |
100 |
| - |
101 |
| -# Step 3: Sending an Inference Request |
102 |
| -# In a separate console, launch the image_client example from the NGC Triton SDK container |
103 |
| -docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.06-py3-sdk |
104 |
| -/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg |
105 |
| - |
106 |
| -# Inference should return the following |
107 |
| -Image '/workspace/images/mug.jpg': |
108 |
| - 15.346230 (504) = COFFEE MUG |
109 |
| - 13.224326 (968) = CUP |
110 |
| - 10.422965 (505) = COFFEEPOT |
111 |
| -``` |
112 |
| -Please read the [QuickStart](docs/getting_started/quickstart.md) guide for additional information |
113 |
| -regarding this example. The quickstart guide also contains an example of how to launch Triton on [CPU-only systems](docs/getting_started/quickstart.md#run-on-cpu-only-system). New to Triton and wondering where to get started? Watch the [Getting Started video](https://youtu.be/NQDtfSi5QF4). |
114 |
| - |
115 |
| -## Examples and Tutorials |
116 |
| - |
117 |
| -Check out [NVIDIA LaunchPad](https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/trial/) |
118 |
| -for free access to a set of hands-on labs with Triton Inference Server hosted on |
119 |
| -NVIDIA infrastructure. |
120 |
| - |
121 |
| -Specific end-to-end examples for popular models, such as ResNet, BERT, and DLRM |
122 |
| -are located in the |
123 |
| -[NVIDIA Deep Learning Examples](https://github.com/NVIDIA/DeepLearningExamples) |
124 |
| -page on GitHub. The |
125 |
| -[NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-triton-inference-server) |
126 |
| -contains additional documentation, presentations, and examples. |
127 |
| - |
128 |
| -## Documentation |
129 |
| - |
130 |
| -### Build and Deploy |
131 |
| - |
132 |
| -The recommended way to build and use Triton Inference Server is with Docker |
133 |
| -images. |
134 |
| - |
135 |
| -- [Install Triton Inference Server with Docker containers](docs/customization_guide/build.md#building-with-docker) (*Recommended*) |
136 |
| -- [Install Triton Inference Server without Docker containers](docs/customization_guide/build.md#building-without-docker) |
137 |
| -- [Build a custom Triton Inference Server Docker container](docs/customization_guide/compose.md) |
138 |
| -- [Build Triton Inference Server from source](docs/customization_guide/build.md#building-on-unsupported-platforms) |
139 |
| -- [Build Triton Inference Server for Windows 10](docs/customization_guide/build.md#building-for-windows-10) |
140 |
| -- Examples for deploying Triton Inference Server with Kubernetes and Helm on [GCP](deploy/gcp/README.md), |
141 |
| - [AWS](deploy/aws/README.md), and [NVIDIA FleetCommand](deploy/fleetcommand/README.md) |
142 |
| -- [Secure Deployment Considerations](docs/customization_guide/deploy.md) |
143 |
| - |
144 |
| -### Using Triton |
145 |
| - |
146 |
| -#### Preparing Models for Triton Inference Server |
147 |
| - |
148 |
| -The first step in using Triton to serve your models is to place one or |
149 |
| -more models into a [model repository](docs/user_guide/model_repository.md). Depending on |
150 |
| -the type of the model and on what Triton capabilities you want to enable for |
151 |
| -the model, you may need to create a [model |
152 |
| -configuration](docs/user_guide/model_configuration.md) for the model. |
153 |
| - |
154 |
| -- [Add custom operations to Triton if needed by your model](docs/user_guide/custom_operations.md) |
155 |
| -- Enable model pipelining with [Model Ensemble](docs/user_guide/architecture.md#ensemble-models) |
156 |
| - and [Business Logic Scripting (BLS)](https://github.com/triton-inference-server/python_backend#business-logic-scripting) |
157 |
| -- Optimize your models setting [scheduling and batching](docs/user_guide/architecture.md#models-and-schedulers) |
158 |
| - parameters and [model instances](docs/user_guide/model_configuration.md#instance-groups). |
159 |
| -- Use the [Model Analyzer tool](https://github.com/triton-inference-server/model_analyzer) |
160 |
| - to help optimize your model configuration with profiling |
161 |
| -- Learn how to [explicitly manage what models are available by loading and |
162 |
| - unloading models](docs/user_guide/model_management.md) |
163 |
| - |
164 |
| -#### Configure and Use Triton Inference Server |
165 |
| - |
166 |
| -- Read the [Quick Start Guide](docs/getting_started/quickstart.md) to run Triton Inference |
167 |
| - Server on both GPU and CPU |
168 |
| -- Triton supports multiple execution engines, called |
169 |
| - [backends](https://github.com/triton-inference-server/backend#where-can-i-find-all-the-backends-that-are-available-for-triton), including |
170 |
| - [TensorRT](https://github.com/triton-inference-server/tensorrt_backend), |
171 |
| - [TensorFlow](https://github.com/triton-inference-server/tensorflow_backend), |
172 |
| - [PyTorch](https://github.com/triton-inference-server/pytorch_backend), |
173 |
| - [ONNX](https://github.com/triton-inference-server/onnxruntime_backend), |
174 |
| - [OpenVINO](https://github.com/triton-inference-server/openvino_backend), |
175 |
| - [Python](https://github.com/triton-inference-server/python_backend), and more |
176 |
| -- Not all the above backends are supported on every platform supported by Triton. |
177 |
| - Look at the |
178 |
| - [Backend-Platform Support Matrix](https://github.com/triton-inference-server/backend/blob/main/docs/backend_platform_support_matrix.md) |
179 |
| - to learn which backends are supported on your target platform. |
180 |
| -- Learn how to [optimize performance](docs/user_guide/optimization.md) using the |
181 |
| - [Performance Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md) |
182 |
| - and |
183 |
| - [Model Analyzer](https://github.com/triton-inference-server/model_analyzer) |
184 |
| -- Learn how to [manage loading and unloading models](docs/user_guide/model_management.md) in |
185 |
| - Triton |
186 |
| -- Send requests directly to Triton with the [HTTP/REST JSON-based |
187 |
| - or gRPC protocols](docs/customization_guide/inference_protocols.md#httprest-and-grpc-protocols) |
188 |
| - |
189 |
| -#### Client Support and Examples |
190 |
| - |
191 |
| -A Triton *client* application sends inference and other requests to Triton. The |
192 |
| -[Python and C++ client libraries](https://github.com/triton-inference-server/client) |
193 |
| -provide APIs to simplify this communication. |
194 |
| - |
195 |
| -- Review client examples for [C++](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/examples), |
196 |
| - [Python](https://github.com/triton-inference-server/client/blob/main/src/python/examples), |
197 |
| - and [Java](https://github.com/triton-inference-server/client/blob/main/src/java/src/main/java/triton/client/examples) |
198 |
| -- Configure [HTTP](https://github.com/triton-inference-server/client#http-options) |
199 |
| - and [gRPC](https://github.com/triton-inference-server/client#grpc-options) |
200 |
| - client options |
201 |
| -- Send input data (e.g. a jpeg image) directly to Triton in the [body of an HTTP |
202 |
| - request without any additional metadata](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md#raw-binary-request) |
203 |
| - |
204 |
| -### Extend Triton |
205 |
| - |
206 |
| -[Triton Inference Server's architecture](docs/user_guide/architecture.md) is specifically |
207 |
| -designed for modularity and flexibility |
208 |
| - |
209 |
| -- [Customize Triton Inference Server container](docs/customization_guide/compose.md) for your use case |
210 |
| -- [Create custom backends](https://github.com/triton-inference-server/backend) |
211 |
| - in either [C/C++](https://github.com/triton-inference-server/backend/blob/main/README.md#triton-backend-api) |
212 |
| - or [Python](https://github.com/triton-inference-server/python_backend) |
213 |
| -- Create [decoupled backends and models](docs/user_guide/decoupled_models.md) that can send |
214 |
| - multiple responses for a request or not send any responses for a request |
215 |
| -- Use a [Triton repository agent](docs/customization_guide/repository_agents.md) to add functionality |
216 |
| - that operates when a model is loaded and unloaded, such as authentication, |
217 |
| - decryption, or conversion |
218 |
| -- Deploy Triton on [Jetson and JetPack](docs/user_guide/jetson.md) |
219 |
| -- [Use Triton on AWS |
220 |
| - Inferentia](https://github.com/triton-inference-server/python_backend/tree/main/inferentia) |
221 |
| - |
222 |
| -### Additional Documentation |
223 |
| - |
224 |
| -- [FAQ](docs/user_guide/faq.md) |
225 |
| -- [User Guide](docs/README.md#user-guide) |
226 |
| -- [Customization Guide](docs/README.md#customization-guide) |
227 |
| -- [Release Notes](https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/index.html) |
228 |
| -- [GPU, Driver, and CUDA Support |
229 |
| -Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html) |
230 |
| - |
231 |
| -## Contributing |
232 |
| - |
233 |
| -Contributions to Triton Inference Server are more than welcome. To |
234 |
| -contribute please review the [contribution |
235 |
| -guidelines](CONTRIBUTING.md). If you have a backend, client, |
236 |
| -example or similar contribution that is not modifying the core of |
237 |
| -Triton, then you should file a PR in the [contrib |
238 |
| -repo](https://github.com/triton-inference-server/contrib). |
239 |
| - |
240 |
| -## Reporting problems, asking questions |
241 |
| - |
242 |
| -We appreciate any feedback, questions or bug reporting regarding this project. |
243 |
| -When posting [issues in GitHub](https://github.com/triton-inference-server/server/issues), |
244 |
| -follow the process outlined in the [Stack Overflow document](https://stackoverflow.com/help/mcve). |
245 |
| -Ensure posted examples are: |
246 |
| -- minimal – use as little code as possible that still produces the |
247 |
| - same problem |
248 |
| -- complete – provide all parts needed to reproduce the problem. Check |
249 |
| - if you can strip external dependencies and still show the problem. The |
250 |
| - less time we spend on reproducing problems the more time we have to |
251 |
| - fix it |
252 |
| -- verifiable – test the code you're about to provide to make sure it |
253 |
| - reproduces the problem. Remove all other problems that are not |
254 |
| - related to your request/question. |
255 |
| - |
256 |
| -For issues, please use the provided bug report and feature request templates. |
257 |
| - |
258 |
| -For questions, we recommend posting in our community |
259 |
| -[GitHub Discussions.](https://github.com/triton-inference-server/server/discussions) |
260 |
| - |
261 |
| -## For more information |
262 |
| - |
263 |
| -Please refer to the [NVIDIA Developer Triton page](https://developer.nvidia.com/nvidia-triton-inference-server) |
264 |
| -for more information. |
| 34 | +> You are currently on the `24.07` branch which tracks under-development and unreleased features. |
0 commit comments