Skip to content

Commit 7b98b8b

Browse files
suraj-vathsaoandreeva-nvGuanLuonnshah1tanmayv25
authored
Suraj/update triton main (#1)
* Changed copyright (triton-inference-server#5705) * Modify timeout test in L0_sequence_batcher to use portable backend (triton-inference-server#5696) * Modify timeout test in L0_sequence_batcher to use portable backend * Use identity backend that is built by default on Windows * updated upstream container name (triton-inference-server#5713) * Fix triton container version (triton-inference-server#5714) * Update the L0_model_config test expected error message (triton-inference-server#5684) * Use better value in timeout test L0_sequence_batcher (triton-inference-server#5716) * Use better value in timeout test L0_sequence_batcher * Format * Update JAX install (triton-inference-server#5613) * Add notes about socket usage to L0_client_memory_growth test (triton-inference-server#5710) * Check TensorRT error message more granularly (triton-inference-server#5719) * Check TRT err msg more granularly * Clarify source of error messages * Consolidate tests for message parts * Pin Python Package Versions for HTML Document Generation (triton-inference-server#5727) * updating with pinned versions for python dependencies * updated with pinned sphinx and nbclient versions * Test full error returned when custom batcher init fails (triton-inference-server#5729) * Add testing for batcher init failure, add wait for status check * Formatting * Change search string * Add fastertransformer test (triton-inference-server#5500) Add fastertransformer test that uses 1GPU. * Fix L0_backend_python on Jetson (triton-inference-server#5728) * Don't use mem probe in Jetson * Clarify failure messages in L0_backend_python * Update copyright * Add JIRA ref, fix _test_jetson * Add testing for Python custom metrics API (triton-inference-server#5669) * Add testing for python custom metrics API * Add custom metrics example to the test * Fix for CodeQL report * Fix test name * Address comment * Add logger and change the enum usage * Add testing for Triton Client Plugin API (triton-inference-server#5706) * Add HTTP client plugin test * Add testing for HTTP asyncio * Add async plugin support * Fix qa container for L0_grpc * Add testing for grpc client plugin * Remove unused imports * Fix up * Fix L0_grpc models QA folder * Update the test based on review feedback * Remove unused import * Add testing for .plugin method * Install jemalloc (triton-inference-server#5738) * Add --metrics-address and testing (triton-inference-server#5737) * Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency * Use non-localhost address * Add testing for basic auth plugin for HTTP/gRPC clients (triton-inference-server#5739) * Add HTTP basic auth test * Add testing for gRPC basic auth * Fix up * Remove unused imports * Add multi-gpu, multi-stream testing for dlpack tensors (triton-inference-server#5550) * Add multi-gpu, multi-stream testing for dlpack tensors * Update note on SageMaker MME support for ensemble (triton-inference-server#5723) * Run L0_backend_python subtests with virtual environment (triton-inference-server#5753) * Update 'main' to track development of 2.35.0 / r23.06 (triton-inference-server#5764) * Include jemalloc into the documentation (triton-inference-server#5760) * Enhance tests in L0_model_update (triton-inference-server#5724) * Add model instance name update test * Add gap for timestamp to update * Add some tests with dynamic batching * Extend supported test on rate limit off * Continue test if off mode failed * Fix L0_memory_growth (triton-inference-server#5795) (1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests. (2) allow unstable measurement from PA. (3) improve logging for future triage * Add note on --metrics-address (triton-inference-server#5800) * Add note on --metrics-address * Copyright * Minor fix for running "mlflow deployments create -t triton --flavor triton ..." (triton-inference-server#5658) UnboundLocalError: local variable 'meta_dict' referenced before assignment The above error shows in listing models in Triton model repository * Adding test for new sequence mode (triton-inference-server#5771) * Adding test for new sequence mode * Update option name * Clean up testing spacing and new lines * MLFlow Triton Plugin: Add support for s3 prefix and custom endpoint URL (triton-inference-server#5686) * MLFlow Triton Plugin: Add support for s3 prefix and custom endpoint URL Signed-off-by: Xiaodong Ye <[email protected]> * Update the function order of config.py and use os.path.join to replace filtering a list of strings then joining Signed-off-by: Xiaodong Ye <[email protected]> * Update onnx flavor to support s3 prefix and custom endpoint URL Signed-off-by: Xiaodong Ye <[email protected]> * Fix two typos in MLFlow Triton plugin README.md Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments (replace => strip) Signed-off-by: Xiaodong Ye <[email protected]> * Address review comments (init regex only for s3) Signed-off-by: Xiaodong Ye <[email protected]> * Remove unused local variable: slash_locations Signed-off-by: Xiaodong Ye <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]> * Fix client script (triton-inference-server#5806) * Add MLFlow test for already loaded models. Update copyright year (triton-inference-server#5808) * Use the correct gtest filter (triton-inference-server#5824) * Add error message test on S3 access decline (triton-inference-server#5825) * Add test on access decline * Fix typo * Add MinIO S3 access decline test * Make sure bucket exists during access decline test * Restore AWS_SECRET_ACCESS_KEY on S3 local test (triton-inference-server#5832) * Restore AWS_SECRET_ACCESS_KEY * Add reason for restoring keys * nnshah1 stream infer segfault fix (triton-inference-server#5842) match logic from infer_handler.cc * Remove unused test (triton-inference-server#5851) * Add and document memory usage in statistic protocol (triton-inference-server#5642) * Add and document memory usage in statistic protocol * Fix doc * Fix up * [DO NOT MERGE Add test. FIXME: model generation * Fix up * Fix style * Address comment * Fix up * Set memory tracker backend option in build.py * Fix up * Add CUPTI library in Windows image build * Add note to build with memory tracker by default * use correct lib dir on CentOS (triton-inference-server#5836) * use correct lib dir on CentOS * use new location for opentelemetry-cpp * Document that gpu-base flag is optional for cpu-only builds (triton-inference-server#5861) * Update Jetson tests in Docker container (triton-inference-server#5734) * Add flags for ORT build * Separate list with commas * Remove unnecessary detection of nvcc compiler * Fixed Jetson path for perf_client, datadir * Create version directoryy for custom model * Remove probe check for shm, add shm exceed error for Jetson * Copyright updates, fix Jetson Probe * Fix be_python test num on Jetson * Remove extra comma, non-Dockerized Jetson comment * Remove comment about Jetson being non-dockerized * Remove no longer needed flag * Update `main` post-23.05 release (triton-inference-server#5880) * Update README and versions for 23.05 branch * Changes to support 23.05 (triton-inference-server#5782) * Update python and conda version * Update CMAKE installation * Update checksum version * Update ubuntu base image to 22.04 * Use ORT 1.15.0 * Set CMAKE to pull latest version * Update libre package version * Removing unused argument * Adding condition for ubuntu 22.04 * Removing installation of the package from the devel container * Nnshah1 u22.04 (triton-inference-server#5770) * Update CMAKE installation * Update python and conda version * Update CMAKE installation * Update checksum version * Update ubuntu base image to 22.04 * updating versions for ubuntu 22.04 * remove re2 --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> * Set ONNX version to 1.13.0 * Fix L0_custom_ops for ubuntu 22.04 (triton-inference-server#5775) * add back rapidjson-dev --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: nv-kmcgill53 <[email protected]> * Fix L0_mlflow (triton-inference-server#5805) * working thread * remove default install of blinker * merge issue fixed * Fix L0_backend_python/env test (triton-inference-server#5799) * Fix L0_backend_python/env test * Address comment * Update the copyright * Fix up * Fix L0_http_fuzz (triton-inference-server#5776) * installing python 3.8.16 for test * spelling Co-authored-by: Neelay Shah <[email protected]> * use util functions to install python3.8 in an easier way --------- Co-authored-by: Neelay Shah <[email protected]> * Update Windows versions for 23.05 release (triton-inference-server#5826) * Rename Ubuntu 20.04 mentions to 22.04 (triton-inference-server#5849) * Update DCGM version (triton-inference-server#5856) * Update DCGM version (triton-inference-server#5857) * downgrade DCGM version to 2.4.7 (triton-inference-server#5860) * Updating link for latest release notes to 23.05 --------- Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: nv-kmcgill53 <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> * Disable memory tracker on Jetpack until the library is available (triton-inference-server#5882) * Fix datadir for x86 (triton-inference-server#5894) * Add more test on instance signature (triton-inference-server#5852) * Add testing for new error handling API (triton-inference-server#5892) * Test batch input for libtorch (triton-inference-server#5855) * Draft ragged TensorRT unit model gen * Draft libtorch special identity model * Autoformat * Update test, fix ragged model gen * Update suffix for io for libtorch * Remove unused variables * Fix io names for libtorch * Use INPUT0/OUTPUT0 for libtorch * Reorder to match test model configs * Remove unnecessary capitalization * Auto-format * Capitalization is necessary * Remove unnecessary export * Clean up Azure dependency in server build (triton-inference-server#5900) * [DO NOT MERGE] * Remove Azure dependency in server component build * Finalize * Fix dependency * Fixing up * Clean up * Add response parameters for streaming GRPC inference to enhance decoupled support (triton-inference-server#5878) * Update 'main' to track development of 2.36.0 / 23.07 (triton-inference-server#5917) * Add test for detecting S3 http2 upgrade request (triton-inference-server#5911) * Add test for detecting S3 http2 upgrade request * Enhance testing * Copyright year update * Add Redis cache build, tests, and docs (triton-inference-server#5916) * Updated handling for uint64 request priority * Ensure HPCX dependencies found in container (triton-inference-server#5922) * Add HPCX dependencies to search path * Copy hpcx to CPU-only container * Add ucc path to CPU-only image * Fixed if statement * Fix df variable * Combine hpcx LD_LIBRARY_PATH * Add test case where MetricFamily is deleted before deleting Metric (triton-inference-server#5915) * Add test case for metric lifetime error handling * Address comment * Use different MetricFamily name * Add testing for Pytorch instance group kind MODEL (triton-inference-server#5810) * Add testing for Pytorch instance group kind MODEL * Remove unused item * Update testing to verify the infer result * Add copyright * Remove unused import * Update pip install * Update the model to use the same add sub logic * Add torch multi-gpu and multi-device models to L0_io * Fix up model version * Add test for sending instance update config via load API (triton-inference-server#5937) * Add test for passing config via load api * Add more docs on instance update behavior * Update to suggested docs Co-authored-by: Ryan McCormick <[email protected]> * Use dictionary for json config * Modify the config fetched from Triton instead --------- Co-authored-by: Ryan McCormick <[email protected]> * Fix L0_batcher count check (triton-inference-server#5939) * Add testing for json tensor format (triton-inference-server#5914) * Add redis config and use local logfile for redis server (triton-inference-server#5945) * Add redis config and use local logfile for redis server * Move redis log config to CLI * Have separate redis logs for unit tests and CLI tests * Add test on rate limiter max resource decrease update (triton-inference-server#5885) * Add test on rate limiter max resource decrease update * Add test with explicit resource * Check server log for decreased resource limit * Add docs on decoupled final response feature (triton-inference-server#5936) * Allow changing ping behavior based on env variable in SageMaker and entrypoint updates (triton-inference-server#5910) * Allow changing ping behavior based on env variable in SageMaker * Add option for additional args * Make ping further configurable * Allow further configuration of grpc and http ports * Update docker/sagemaker/serve * Update docker/sagemaker/serve --------- Co-authored-by: GuanLuo <[email protected]> * Remove only MPI libraries in HPCX in L0_perf_analyzer (triton-inference-server#5967) * Be more specific with MPI removal * Delete all libmpi libs * Ensure L0_batch_input requests received in order (triton-inference-server#5963) * Add print statements for debugging * Add debugging print statements * Test using grpc client with stream to fix race * Use streaming client in all non-batch tests * Switch all clients to streaming GRPC * Remove unused imports, vars * Address comments * Remove random comment * Set inputs as separate function * Split set inputs based on test type * Add test for redis cache auth credentials via env vars (triton-inference-server#5966) * Auto-formatting (triton-inference-server#5979) * Auto-format * Change to clang-format-15 in CONTRIBTUING * Adding tests ensuring locale setting is passed to python backend interpreter * Refactor build.py CPU-only Linux libs for readability (triton-inference-server#5990) * Improve the error message when the number of GPUs is insufficient (triton-inference-server#5993) * Update README to include CPP-API Java Bindings (triton-inference-server#5883) * Update env variable to use for overriding /ping behavior (triton-inference-server#5994) * Add test that >1000 model files can be loaded in S3 (triton-inference-server#5976) * Add test for >1000 files * Capitalization for consistency * Add bucket cleaning at end * Move test pass/fail to end * Check number of files in model dir at load time * Add testing for GPU tensor error handling (triton-inference-server#5871) * Add testing for GPU tensor error handling * Fix up * Remove exit 0 * Fix jetson * Fix up * Add test for Python BLS model loading API (triton-inference-server#5980) * Add test for Python BLS model loading API * Fix up * Update README and versions for 23.06 branch * Fix LD_LIBRARY_PATH for PyTorch backend * Return updated df in add_cpu_libs * Remove unneeded df param * Update test failure messages to match Dataloader changes (triton-inference-server#6006) * Add dependency for L0_python_client_unit_tests (triton-inference-server#6010) * Improve performance tuning guide (triton-inference-server#6026) * Enabling nested spans for trace mode OpenTelemetry (triton-inference-server#5928) * Adding nested spans to OTel tracing + support of ensemble models * Move multi-GPU dlpack test to a separate L0 test (triton-inference-server#6001) * Move multi-GPU dlpack test to a separate L0 test * Fix copyright * Fix up * OpenVINO 2023.0.0 (triton-inference-server#6031) * Upgrade OV to 2023.0.0 * Upgrade OV model gen script to 2023.0.0 * Add test to check the output memory type for onnx models (triton-inference-server#6033) * Add test to check the output memory type for onnx models * Remove unused import * Address comment * Add testing for implicit state for PyTorch backend (triton-inference-server#6016) * Add testing for implicit state for PyTorch backend * Add testing for libtorch string implicit models * Fix CodeQL * Mention that libtorch backend supports implicit state * Fix CodeQL * Review edits * Fix output tests for PyTorch backend * Allow uncompressed conda execution enviroments (triton-inference-server#6005) Add test for uncompressed conda execution enviroments * Fix implicit state test (triton-inference-server#6039) * Adding target_compile_features cxx_std_17 to tracing lib (triton-inference-server#6040) * Update 'main' to track development of 2.37.0 / 23.08 * Fix intermittent failure in L0_model_namespacing (triton-inference-server#6052) * Fix PyTorch implicit model mounting in gen_qa_model_repository (triton-inference-server#6054) * Fix broken links pointing to the `grpc_server.cc` file (triton-inference-server#6068) * Fix L0_backend_python expected instance name (triton-inference-server#6073) * Fix expected instance name * Copyright year * Fix L0_sdk: update the search name for the client wheel (triton-inference-server#6074) * Fix name of client wheel to be looked for * Fix up * Add GitHub action to format and lint code (triton-inference-server#6022) * Add pre-commit * Fix typos, exec/shebang, formatting * Remove clang-format * Update contributing md to include pre-commit * Update spacing in CONTRIBUTING * Fix contributing pre-commit link * Link to pre-commit install directions * Wording * Restore clang-format * Fix yaml spacing * Exclude templates folder for check-yaml * Remove unused vars * Normalize spacing * Remove unused variable * Normalize config indentation * Update .clang-format to enforce max line length of 80 * Update copyrights * Update copyrights * Run workflows on every PR * Fix copyright year * Fix grammar * Entrypoint.d files are not executable * Run pre-commit hooks * Mark not executable * Run pre-commit hooks * Remove unused variable * Run pre-commit hooks after rebase * Update copyrights * Fix README.md typo (decoupled) Co-authored-by: Ryan McCormick <[email protected]> * Run pre-commit hooks * Grammar fix Co-authored-by: Ryan McCormick <[email protected]> * Redundant word Co-authored-by: Ryan McCormick <[email protected]> * Revert docker file changes * Executable shebang revert * Make model.py files non-executable * Passin is proper flag * Run pre-commit hooks on init_args/model.py * Fix typo in init_args/model.py * Make copyrights one line --------- Co-authored-by: Ryan McCormick <[email protected]> * Fix default instance name change when count is 1 (triton-inference-server#6088) * Add test for sequence model instance update (triton-inference-server#5831) * Add test for sequence model instance update * Add gap for file timestamp update * Update test for non-blocking sequence update * Update documentation * Remove mentioning increase instance count case * Add more documentaion for scheduler update test * Update test for non-blocking batcher removal * Add polling due to async scheduler destruction * Use _ as private * Fix typo * Add docs on instance count decrease * Fix typo * Separate direct and oldest to different test cases * Separate nested tests in a loop into multiple test cases * Refactor scheduler update test * Improve doc on handling future test failures * Address pre-commit * Add best effort to reset model state after a single test case failure * Remove reset model method to make harder for chaining multiple test cases as one * Remove description on model state clean up * Fix default instance name (triton-inference-server#6097) * Removing unused tests (triton-inference-server#6085) * Update post-23.07 release (triton-inference-server#6103) * Update README and versions for 2.36.0 / 23.07 * Update Dockerfile.win10.min * Fix formating issue * fix formating issue * Fix whitespaces * Fix whitespaces * Fix whitespaces * Improve asyncio testing (triton-inference-server#6122) * Reduce instance count to 1 for python bls model loading test (triton-inference-server#6130) * Reduce instance count to 1 for python bls model loading test * Add comment when calling unload * Fix queue test to expect exact number of failures (triton-inference-server#6133) * Fix queue test to expect exact number of failures * Increase the execution time to more accurately capture requests * Add CPU & GPU metrics in Grafana dashboard.json for K8s op prem deployment (fix triton-inference-server#6047) (triton-inference-server#6100) Signed-off-by: Xiaodong Ye <[email protected]> * Adding the support tracing of child models invoked from a BLS model (triton-inference-server#6063) * Adding tests for bls * Added fixme, cleaned previous commit * Removed unused imports * Fixing commit tree: Refactor code, so that OTel tracer provider is initialized only once Added resource cmd option, testig Added docs * Clean up * Update docs/user_guide/trace.md Co-authored-by: Ryan McCormick <[email protected]> * Revision * Update doc * Clean up * Added ostream exporter to OpenTelemetry for testing purposes; refactored trace tests * Added opentelemetry trace collector set up to tests; refactored otel exporter tests to use OTel collector instead of netcat * Revising according to comments * Added comment regarding 'parent_span_id' * Added permalink * Adjusted test --------- Co-authored-by: Ryan McCormick <[email protected]> * Test python environments 3.8-3.11 (triton-inference-server#6109) Add tests for python 3.8-3.11 for L0_python_backends * Improve L0_backend_python debugging (triton-inference-server#6157) * Improve L0_backend_python debugging * Use utils function for artifacts collection * Add unreachable output test for reporting source of disconnectivity (triton-inference-server#6149) * Update 'main' to track development of 2.38.0 / 23.09 (triton-inference-server#6163) * Fix the versions in the doc (triton-inference-server#6164) * Update docs with NVAIE messaging (triton-inference-server#6162) Update docs with NVAIE messaging * Add sanity tests for parallel instance loading (triton-inference-server#6126) * Remove extra whitespace (triton-inference-server#6174) * Remove a test case that sanity checks input value of --shape CLI flag (triton-inference-server#6140) * Remove test checking for --shape option * Remove the entire test * Add test when unload/load requests for same model is received at the same time (triton-inference-server#6150) * Add test when unload/load requests for same model received the same time * Add test_same_model_overlapping_load_unload * Use a load/unload stress test instead * Pre-merge test name update * Address pre-commit error * Revert "Address pre-commit error" This reverts commit 781cab1. * Record number of occurrence of each exception * Make assert failures clearer in L0_trt_plugin (triton-inference-server#6166) * Add end-to-end CI test for decoupled model support (triton-inference-server#6131) (triton-inference-server#6184) * Add end-to-end CI test for decoupled model support * Address feedback * Test preserve_ordering for oldest strategy sequence batcher (triton-inference-server#6185) * added debugging guide (triton-inference-server#5924) * added debugging guide * Run pre-commit --------- Co-authored-by: David Yastremsky <[email protected]> * Add deadlock gdb section to debug guide (triton-inference-server#6193) * Fix character escape in model repository documentation (triton-inference-server#6197) * Fix docs test (triton-inference-server#6192) * Add utility functions for array manipulation (triton-inference-server#6203) * Add utility functions for outlier removal * Fix functions * Add newline to end of file * Add gc collect to make sure gpu tensor is deallocated (triton-inference-server#6205) * Testing: add gc collect to make sure gpu tensor is deallocated * Address comment * Check for log error on failing to find explicit load model (triton-inference-server#6204) * Set default shm size to 1MB for Python backend (triton-inference-server#6209) * Trace Model Name Validation (triton-inference-server#6199) * Initial commit * Cleanup using new standard formatting * QA test restructuring * Add newline to the end of test.sh * HTTP/GRCP protocol changed to pivot on ready status & error status. Log file name changed in qa test. * Fixing unhandled error memory leak * Handle index function memory leak fix * Fix the check for error message (triton-inference-server#6226) * Fix copyright for debugging guide (triton-inference-server#6225) * Add watts units to GPU power metric descriptions (triton-inference-server#6242) * Update post-23.08 release (triton-inference-server#6234) * CUDA 12.1 > 12.2 * DLIS-5208: onnxruntime+windows - stop treat warnings on compile as errors * Revert "DLIS-5208: onnxruntime+windows - stop treat warnings on compile as errors" This reverts commit 0cecbb7. * Update Dockerfile.win10.min * Update Dockerfile.win10.min * Update README and versions for 23.08 branch * Update Dockerfile.win10 * Fix the versions in docs * Add the note about stabilization of the branch * Update docs with NVAIE messaging (triton-inference-server#6162) (triton-inference-server#6167) Update docs with NVAIE messaging Co-authored-by: David Zier <[email protected]> * Resolve merge conflict --------- Co-authored-by: tanmayv25 <[email protected]> Co-authored-by: David Zier <[email protected]> * Add tests/docs for queue size (pending request count) metric (triton-inference-server#6233) * Adding safe string to number conversions (triton-inference-server#6173) * Added catch for out of range error for trace setting update * Added wrapper to safe parse options * Added option names to errors * Adjustments * Quick fix * Fixing option name for Windows * Removed repetitive code * Adjust getopt_long for Windows to use longindex * Moved try catch into ParseOption * Removed unused input * Improved names * Refactoring and clean up * Fixed Windows * Refactored getopt_long for Windows * Refactored trace test, pinned otel's collector version to avoid problems with go requirements * Test Python execute() to return Triton error code (triton-inference-server#6228) * Add test for Python execute error code * Add all supported error codes into test * Move ErrorCode into TritonError * Expose ErrorCode internal in TritonError * Add docs on IPv6 (triton-inference-server#6262) * Add test for TensorRT version-compatible model support (triton-inference-server#6255) * Add tensorrt version-compatibility test * Generate one version-compatible model * Fix copyright year * Remove unnecessary variable * Remove unnecessary line * Generate TRT version-compatible model * Add sample inference to TRT version-compatible test * Clean up utils and model gen for new plan model * Fix startswith capitalization * Remove unused imports * Remove unused imports * Add log check * Upgrade protobuf version (triton-inference-server#6268) * Add testing for retrieving shape and datatype in backend API (triton-inference-server#6231) Add testing for retrieving output shape and datatype info from backend API * Update 'main' to track development of 2.39.0 / 23.10 (triton-inference-server#6277) * Apply UCX workaround (triton-inference-server#6254) * Add ensemble parameter forwarding test (triton-inference-server#6284) * Exclude extra TRT version-compatible models from tests (triton-inference-server#6294) * Exclude compatible models from tests. * Force model removal, in case it does not exist Co-authored-by: Ryan McCormick <[email protected]> --------- Co-authored-by: Ryan McCormick <[email protected]> * Adding installation of docker and docker-buildx (triton-inference-server#6299) * Adding installation of docker and docker-buildx * remove whitespace * Use targetmodel from header as model name in SageMaker (triton-inference-server#6147) * Use targetmodel from header as model name in SageMaker * Update naming for model hash * Add more error messages, return codes, and refactor HTTP server (triton-inference-server#6297) * Fix typo (triton-inference-server#6318) * Update the request re-use example (triton-inference-server#6283) * Update the request re-use example * Review edit * Review comment * Disable developer tools build for In-process API + JavaCPP tests (triton-inference-server#6296) * Add Python binding build. Add L0_python_api to test Python binding (triton-inference-server#6319) * Add L0_python_api to test Python binding * Install Python API in CI image * Fix QA build * Increase network timeout for valgrind (triton-inference-server#6324) * Tests and docs for ability to specify subdirectory to download for LocalizePath (triton-inference-server#6308) * Added custom localization tests for s3 and azure, added docs * Refactor HandleInfer into more readable chunks (triton-inference-server#6332) * Refactor model generation scripts (triton-inference-server#6336) * Refactor model generation scripts * Fix codeql * Fix relative path import * Fix package structure * Copy the gen_common file * Add missing uint8 * Remove duplicate import * Add testing for scalar I/O in ORT backend (triton-inference-server#6343) * Add testing for scalar I/O in ORT backend * Review edit * ci * Update post-23.09 release (triton-inference-server#6367) * Update README and versions for 23.09 branch (triton-inference-server#6280) * Update `Dockerfile` and `build.py` (triton-inference-server#6281) * Update configuration for Windows Dockerfile (triton-inference-server#6256) * Adding installation of docker and docker-buildx * Enable '--expt-relaxed-constexpr' flag for custom ops models * Upate Dockerfile version * Disable unit tests for Jetson * Update condition (triton-inference-server#6285) * removing Whitespaces (triton-inference-server#6293) * removing Whitespaces * removing whitespaces * Add security policy (triton-inference-server#6376) * Adding client-side request cancellation support and testing (triton-inference-server#6383) * Add L0_request_cancellation (triton-inference-server#6252) * Add L0_request_cancellation * Remove unittest test * Add cancellation to gRPC server error handling * Fix up * Use identity model * Add tests for gRPC client-side cancellation (triton-inference-server#6278) * Add tests for gRPC client-side cancellation * Fix CodeQL issues * Formatting * Update qa/L0_client_cancellation/client_cancellation_test.py Co-authored-by: Ryan McCormick <[email protected]> * Move to L0_request_cancellation * Address review comments * Removing request cancellation support from asyncio version * Format * Update copyright * Remove tests * Handle cancellation notification in gRPC server (triton-inference-server#6298) * Handle cancellation notification in gRPC server * Fix the request ptr initialization * Update src/grpc/infer_handler.h Co-authored-by: Ryan McCormick <[email protected]> * Address review comment * Fix logs * Fix request complete callback by removing reference to state * Improve documentation --------- Co-authored-by: Ryan McCormick <[email protected]> --------- Co-authored-by: Ryan McCormick <[email protected]> * Fixes on the gRPC frontend to handle AsyncNotifyWhenDone() API (triton-inference-server#6345) * Fix segmentation fault in gRPC frontend * Finalize all states upon completion * Fixes all state cleanups * Handle completed states when cancellation notification is received * Add more documentation steps * Retrieve dormant states to minimize the memory footprint for long streams * Update src/grpc/grpc_utils.h Co-authored-by: Ryan McCormick <[email protected]> * Use a boolean state instead of raw pointer --------- Co-authored-by: Ryan McCormick <[email protected]> * Add L0_grpc_state_cleanup test (triton-inference-server#6353) * Add L0_grpc_state_cleanup test * Add model file in QA container * Fix spelling * Add remaining subtests * Add failing subtests * Format fixes * Fix model repo * Fix QA docker file * Remove checks for the error message when shutting down server * Fix spelling * Address review comments * Add schedulers request cancellation tests (triton-inference-server#6309) * Add schedulers request cancellation tests * Merge gRPC client test * Reduce testing time and covers cancelling other requests as a consequence of request cancellation * Add streaming request cancellation test --------- Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]> * Add missing copyright (triton-inference-server#6388) * Add basic generate endpoints for LLM tasks (triton-inference-server#6366) * PoC of parsing request prompt and converting to Triton infer request * Remove extra trace * Add generate endpoint * Enable streaming version * Fix bug * Fix up * Add basic testing. Cherry pick from triton-inference-server#6369 * format * Address comment. Fix build * Minor cleanup * cleanup syntax * Wrap error in SSE format * Fix up * Restrict number of response on non-streaming generate * Address comment on implementation. * Re-enable trace on generate endpoint * Add more comprehensive llm endpoint tests (triton-inference-server#6377) * Add security policy (triton-inference-server#6376) * Start adding some more comprehensive tests * Fix test case * Add response error testing * Complete test placeholder * Address comment * Address comments * Fix code check --------- Co-authored-by: dyastremsky <[email protected]> Co-authored-by: GuanLuo <[email protected]> * Address comment * Address comment * Address comment * Fix typo --------- Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: dyastremsky <[email protected]> * Add Python backend request cancellation test (triton-inference-server#6364) * Add cancelled response status test * Add Python backend request cancellation test * Add Python backend decoupled request cancellation test * Simplified response if cancelled * Test response_sender.send() after closed * Rollback test response_sender.send() after closed * Rollback non-decoupled any response on cancel * Add TRT-LLM backend build to Triton (triton-inference-server#6365) (triton-inference-server#6392) * Add TRT-LLM backend build to Triton (triton-inference-server#6365) * Add trtllm backend to build * Temporarily adding version map for 23.07 * Fix build issue * Update comment * Comment out python binding changes * Add post build * Update trtllm backend naming * Update TRTLLM base image * Fix cmake arch * Revert temp changes for python binding PR * Address comment * Move import to the top (triton-inference-server#6395) * Move import to the top * pre commit format * Add Python backend when vLLM backend built (triton-inference-server#6397) * Update build.py to build vLLM backend (triton-inference-server#6394) * Support parameters object in generate route * Update 'main' to track development of 2.40.0 / 23.11 (triton-inference-server#6400) * Fix L0_sdk (triton-inference-server#6387) * Add documentation on request cancellation (triton-inference-server#6403) * Add documentation on request cancellation * Include python backend * Update docs/user_guide/request_cancellation.md Co-authored-by: Iman Tabrizian <[email protected]> * Update docs/user_guide/request_cancellation.md Co-authored-by: Neelay Shah <[email protected]> * Update docs/README.md Co-authored-by: Neelay Shah <[email protected]> * Update docs/user_guide/request_cancellation.md Co-authored-by: Ryan McCormick <[email protected]> * Remove inflight term from the main documentation * Address review comments * Fix * Update docs/user_guide/request_cancellation.md Co-authored-by: Jacky <[email protected]> * Fix --------- Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]> * Fixes in request cancellation doc (triton-inference-server#6409) * Document generate HTTP endpoint (triton-inference-server#6412) * Document generate HTTP endpoint * Address comment * Fix up * format * Address comment * Update SECURITY.md to not display commented copyright (triton-inference-server#6426) * Fix missing library in L0_data_compression (triton-inference-server#6424) * Fix missing library in L0_data_compression * Fix up * Add Javacpp-presets repo location as env variable in Java tests(triton-inference-server#6385) Simplify testing when upstream (javacpp-presets) build changes. Related to triton-inference-server/client#409 * TRT-LLM backend build changes (triton-inference-server#6406) * Update url * Debugging * Debugging * Update url * Fix build for TRT-LLM backend * Remove TRTLLM TRT and CUDA versions * Fix up unused var * Fix up dir name * FIx cmake patch * Remove previous TRT version * Install required packages for example models * Remove packages that are only needed for testing * Add gRPC AsyncIO request cancellation tests (triton-inference-server#6408) * Fix gRPC test failure and refactor * Add gRPC AsyncIO cancellation tests * Better check if a request is cancelled * Use f-string * Fix L0_implicit_state (triton-inference-server#6427) * Fixing vllm build (triton-inference-server#6433) * Fixing torch version for vllm * Switch Jetson model TensorRT models generation to container (triton-inference-server#6378) * Switch Jetson model TensorRT models generation to container * Adding missed file * Fix typo * Fix typos * Remove extra spaces * Fix typo * Bumped vllm version (triton-inference-server#6444) * Adjust test_concurrent_same_model_load_unload_stress (triton-inference-server#6436) * Adding emergency vllm latest release (triton-inference-server#6454) * Fix notify state destruction and inflight states tracking (triton-inference-server#6451) * Ensure notify_state_ gets properly destructed * Fix inflight state tracking to properly erase states * Prevent removing the notify_state from being erased * Wrap notify_state_ object within unique_ptr * Update TRT-LLM backend url (triton-inference-server#6455) * TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue * Added docs on python based backends (triton-inference-server#6429) Co-authored-by: Neelay Shah <[email protected]> * L0_model_config Fix (triton-inference-server#6472) * Minor fix for L0_model_config * Add test for Python model parameters (triton-inference-server#6452) * Test Python BLS with different sizes of CUDA memory pool (triton-inference-server#6276) * Test with different sizes of CUDA memory pool * Check the server log for error message * Improve debugging * Fix syntax * Add documentation for K8s-onprem StartupProbe (triton-inference-server#5257) Co-authored-by: dyastremsky <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> * Update `main` post-23.10 release (triton-inference-server#6484) * Update README and versions for 23.10 branch (triton-inference-server#6399) * Cherry-picking vLLM backend changes (triton-inference-server#6404) * Update build.py to build vLLM backend (triton-inference-server#6394) * Add Python backend when vLLM backend built (triton-inference-server#6397) --------- Co-authored-by: dyastremsky <[email protected]> * Add documentation on request cancellation (triton-inference-server#6403) (triton-inference-server#6407) * Add documentation on request cancellation * Include python backend * Update docs/user_guide/request_cancellation.md * Update docs/user_guide/request_cancellation.md * Update docs/README.md * Update docs/user_guide/request_cancellation.md * Remove inflight term from the main documentation * Address review comments * Fix * Update docs/user_guide/request_cancellation.md * Fix --------- Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]> * Fixes in request cancellation doc (triton-inference-server#6409) (triton-inference-server#6410) * TRT-LLM backend build changes (triton-inference-server#6406) (triton-inference-server#6430) * Update url * Debugging * Debugging * Update url * Fix build for TRT-LLM backend * Remove TRTLLM TRT and CUDA versions * Fix up unused var * Fix up dir name * FIx cmake patch * Remove previous TRT version * Install required packages for example models * Remove packages that are only needed for testing * Fixing vllm build (triton-inference-server#6433) (triton-inference-server#6437) * Fixing torch version for vllm Co-authored-by: Olga Andreeva <[email protected]> * Update TRT-LLM backend url (triton-inference-server#6455) (triton-inference-server#6460) * TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue * remove redundant lines * Revert "remove redundant lines" This reverts commit 86be7ad. * restore missed lines * Update build.py Co-authored-by: Olga Andreeva <[email protected]> * Update build.py Co-authored-by: Olga Andreeva <[email protected]> --------- Co-authored-by: Tanmay Verma <[email protected]> Co-authored-by: dyastremsky <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: Kris Hung <[email protected]> Co-authored-by: Katherine Yang <[email protected]> Co-authored-by: Olga Andreeva <[email protected]> * Adding structure reference to the new document (triton-inference-server#6493) * Improve L0_backend_python test stability (ensemble / gpu_tensor_lifecycle) (triton-inference-server#6490) * Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency * Add L0_generative_sequence test (triton-inference-server#6475) * Add testing backend and test * Add test to build / CI. Minor fix on L0_http * Format. Update backend documentation * Fix up * Address comment * Add negative testing * Fix up * Downgrade vcpkg version (triton-inference-server#6503) * Collecting sub dir artifacts in GitLab yaml. Removing collect function from test script. (triton-inference-server#6499) * Use post build function for TRT-LLM backend (triton-inference-server#6476) * Use postbuild function * Remove updating submodule url * Enhanced python_backend autocomplete (triton-inference-server#6504) * Added testing for python_backend autocomplete: optional input and model_transaction_policy * Parse reuse-grpc-port and reuse-http-port as booleans (triton-inference-server#6511) Co-authored-by: Francesco Petrini <[email protected]> * Fixing L0_io (triton-inference-server#6510) * Fixing L0_io * Add Python-based backends CI (triton-inference-server#6466) * Bumped vllm version * Add python-bsed backends testing * Add python-based backends CI * Fix errors * Add vllm backend * Fix pre-commit * Modify test.sh * Remove vllm_opt qa model * Remove vLLM ackend tests * Resolve review comments * Fix pre-commit errors * Update qa/L0_backend_python/python_based_backends/python_based_backends_test.py Co-authored-by: Tanmay Verma <[email protected]> * Remove collect_artifacts_from_subdir function call --------- Co-authored-by: oandreeva-nv <[email protected]> Co-authored-by: Tanmay Verma <[email protected]> * Enabling option to restrict access to HTTP APIs based on header value pairs (similar to gRPC) * Upgrade DCGM from 2.4.7 to 3.2.6 (triton-inference-server#6515) * Enhance GCS credentials documentations (triton-inference-server#6526) * Test file override outside of model directory (triton-inference-server#6516) * Add boost-filesystem * Update ORT version to 1.16.2 (triton-inference-server#6531) * Adjusting expected error msg (triton-inference-server#6517) * Update 'main' to track development of 2.41.0 / 23.12 (triton-inference-server#6543) * Enhance testing for pending request count (triton-inference-server#6532) * Enhance testing for pending request count * Improve the documentation * Add more documentation * Add testing for Python backend request rescheduling (triton-inference-server#6509) * Add testing * Fix up * Enhance testing * Fix up * Revert test changes * Add grpc endpoint test * Remove unused import * Remove unused import * Update qa/L0_backend_python/request_rescheduling/grpc_endpoint_test.py Co-authored-by: Iman Tabrizian <[email protected]> * Update qa/python_models/bls_request_rescheduling/model.py Co-authored-by: Iman Tabrizian <[email protected]> --------- Co-authored-by: Iman Tabrizian <[email protected]> * Check that the wget is installed (triton-inference-server#6556) * secure deployment considerations guide (triton-inference-server#6533) * draft document * updates * updates * updated * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * updates * update * updates * updates * Update docs/customization_guide/deploy.md Co-authored-by: Kyle McGill <[email protected]> * Update docs/customization_guide/deploy.md Co-authored-by: Kyle McGill <[email protected]> * fixing typos * updated with clearer warnings * updates to readme and toc --------- Co-authored-by: Kyle McGill <[email protected]> * Fix typo and change the command line order (triton-inference-server#6557) * Fix typo and change the command line order * Improve visual experience. Add 'clang' package * Add error during rescheduling test to L0_generative_sequence (triton-inference-server#6550) * changing references to concrete instances * Add testing for implicit state enhancements (triton-inference-server#6524) * Add testing for single buffer * Add testing for implicit state with buffer growth * Improve testing * Fix up * Add CUDA virtual address size flag * Add missing test files * Parameter rename * Test fixes * Only build implicit state backend for GPU=ON * Fix copyright (triton-inference-server#6584) * Mention TRT LLM backend supports request cancellation (triton-inference-server#6585) * update model repository generation for onnx models for protobuf (triton-inference-server#6575) * Fix L0_sagemaker (triton-inference-server#6587) * Add C++ server wrapper to the doc (triton-inference-server#6592) * Add timeout to client apis and tests (triton-inference-server#6546) Client PR: triton-inference-server/client#429 * Change name generative -> iterative (triton-inference-server#6601) * name changes * updated names * Add documentation on generative sequence (triton-inference-server#6595) * Add documentation on generative sequence * Address comment * Reflect the "iterative" change * Updated description of iterative sequences * Restricted HTTP API documentation Co-authored-by: Ryan McCormick <[email protected]> * Add request cancellation and debugging guide to generated docs (triton-inference-server#6617) * Support for http request cancellation. Includes fix for seg fault in generate_stream endpoint. * Bumped vLLM version to v0.2.2 (triton-inference-server#6623) * Upgrade ORT version (triton-inference-server#6618) * Use compliant preprocessor (triton-inference-server#6626) * Update README.md (triton-inference-server#6627) * Extend request objects lifetime and fixes possible segmentation fault (triton-inference-server#6620) * Extend request objects lifetime * Remove explicit TRITONSERVER_InferenceRequestDelete * Format fix * Include the inference_request_ initialization to cover RequestNew --------- Co-authored-by: Neelay Shah <[email protected]> * Update protobuf after python update for testing (triton-inference-server#6638) This fixes the issue where python client has `AttributeError: 'NoneType' object has no attribute 'enum_types_by_name' errors after python version is updated. * Update post-23.11 release (triton-inference-server#6653) * Update README and versions for 2.40.0 / 23.11 (triton-inference-server#6544) * Removing path construction to use SymLink alternatives * Update version for PyTorch * Update windows Dockerfile configuration * Update triton version to 23.11 * Update README and versions for 2.40.0 / 23.11 * Fix typo * Ading 'ldconfig' to configure dynamic linking in container (triton-inference-server#6602) * Point to tekit_backend (triton-inference-server#6616) * Point to tekit_backend * Update version * Revert tekit changes (triton-inference-server#6640) --------- Co-authored-by: Kris Hung <[email protected]> * PYBE Timeout Tests (triton-inference-server#6483) * New testing to confirm large request timeout values can be passed and retrieved within Python BLS models. * Add note on lack of ensemble support (triton-inference-server#6648) * Added request id to span attributes (triton-inference-server#6667) * Add test for optional internal tensor within an ensemble (triton-inference-server#6663) * Add test for optional internal tensor within an ensemble * Fix up * Set CMake version to 3.27.7 (triton-inference-server#6675) * Set CMake version to 3.27.7 * Set CMake version to 3.27.7 * Fix double slash typo * restore typo (triton-inference-server#6680) * Update 'main' to track development of 2.42.0 / 24.01 (triton-inference-server#6673) * iGPU build refactor (triton-inference-server#6684) (triton-inference-server#6691) * Mlflow Plugin Fix (triton-inference-server#6685) * Mlflow plugin fix * Fix extra content-type headers in HTTP server (triton-inference-server#6678) * Fix iGPU CMakeFile tags (triton-inference-server#6695) * Unify iGPU test build with x86 ARM * adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1 * re-organizing some copies in Dockerfile.QA to fix igpu devel build * Pre-commit fix --------- Co-authored-by: kyle <[email protected]> * adding default value for TRITON_IGPU_BUILD=OFF (triton-inference-server#6705) * adding default value for TRITON_IGPU_BUILD=OFF * fix newline --------- Co-authored-by: kyle <[email protected]> * Add test case for decoupled model raising exception (triton-inference-server#6686) * Add test case for decoupled model raising exception * Remove unused import * Address comment * Escape special characters in general docs (triton-inference-server#6697) * vLLM Benchmarking Test (triton-inference-server#6631) * vLLM Benchmarking Test * Allow configuring GRPC max connection age and max connection age grace (triton-inference-server#6639) * Add ability to configure GRPC max connection age and max connection age grace * Allow pass GRPC connection age args when they are set from command ---------- Co-authored-by: Katherine Yang <[email protected]> --------- Signed-off-by: Xiaodong Ye <[email protected]> Co-authored-by: Olga Andreeva <[email protected]> Co-authored-by: GuanLuo <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: Tanmay Verma <[email protected]> Co-authored-by: Kris Hung <[email protected]> Co-authored-by: Jacky <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: dyastremsky <[email protected]> Co-authored-by: Katherine Yang <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: Gerard Casas Saez <[email protected]> Co-authored-by: Misha Chornyi <[email protected]> Co-authored-by: R0CKSTAR <[email protected]> Co-authored-by: Elias Bermudez <[email protected]> Co-authored-by: ax-vivien <[email protected]> Co-authored-by: Neelay Shah <[email protected]> Co-authored-by: nv-kmcgill53 <[email protected]> Co-authored-by: Matthew Kotila <[email protected]> Co-authored-by: Nikhil Kulkarni <[email protected]> Co-authored-by: Misha Chornyi <[email protected]> Co-authored-by: Iman Tabrizian <[email protected]> Co-authored-by: David Yastremsky <[email protected]> Co-authored-by: Timothy Gerdes <[email protected]> Co-authored-by: Mate Mijolović <[email protected]> Co-authored-by: David Zier <[email protected]> Co-authored-by: Hyunjae Woo <[email protected]> Co-authored-by: Tanay Varshney <[email protected]> Co-authored-by: Francesco Petrini <[email protected]> Co-authored-by: Dmitry Mironov <[email protected]> Co-authored-by: Ryan McCormick <[email protected]> Co-authored-by: Sai Kiran Polisetty <[email protected]> Co-authored-by: oandreeva-nv <[email protected]> Co-authored-by: kyle <[email protected]> Co-authored-by: Neal Vaidya <[email protected]> Co-authored-by: siweili11 <[email protected]>
1 parent ad9d754 commit 7b98b8b

File tree

883 files changed

+76704
-32486
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

883 files changed

+76704
-32486
lines changed

.clang-format

+3-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
BasedOnStyle: Google
33

44
IndentWidth: 2
5+
ColumnLimit: 80
56
ContinuationIndentWidth: 4
67
UseTab: Never
78
MaxEmptyLinesToKeep: 2
@@ -34,4 +35,5 @@ BinPackArguments: true
3435
BinPackParameters: true
3536
ConstructorInitializerAllOnOneLineOrOnePerLine: false
3637

37-
IndentCaseLabels: true
38+
IndentCaseLabels: true
39+

.github/workflows/codeql.yml

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
name: "CodeQL"
28+
29+
on:
30+
pull_request:
31+
32+
jobs:
33+
analyze:
34+
name: Analyze
35+
runs-on: ubuntu-latest
36+
permissions:
37+
actions: read
38+
contents: read
39+
security-events: write
40+
41+
strategy:
42+
fail-fast: false
43+
matrix:
44+
language: [ 'python' ]
45+
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
46+
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
47+
48+
steps:
49+
- name: Checkout repository
50+
uses: actions/checkout@v3
51+
52+
# Initializes the CodeQL tools for scanning.
53+
- name: Initialize CodeQL
54+
uses: github/codeql-action/init@v2
55+
with:
56+
languages: ${{ matrix.language }}
57+
# If you wish to specify custom queries, you can do so here or in a config file.
58+
# By default, queries listed here will override any specified in a config file.
59+
# Prefix the list here with "+" to use these queries and those in the config file.
60+
61+
# Details on CodeQL's query packs refer to:
62+
# https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
63+
queries: +security-and-quality
64+
65+
66+
# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
67+
# If this step fails, then you should remove it and run the build manually (see below)
68+
- name: Autobuild
69+
uses: github/codeql-action/autobuild@v2
70+
71+
# Command-line programs to run using the OS shell.
72+
# See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
73+
74+
# If the Autobuild fails above, remove it and uncomment the following three lines.
75+
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
76+
77+
# - run: |
78+
# echo "Run, Build Application using script"
79+
# ./location_of_script_within_repo/buildscript.sh
80+
81+
- name: Perform CodeQL Analysis
82+
uses: github/codeql-action/analyze@v2
83+
with:
84+
category: "/language:${{matrix.language}}"

.github/workflows/pre-commit.yaml

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
name: pre-commit
28+
29+
on:
30+
pull_request:
31+
32+
jobs:
33+
pre-commit:
34+
runs-on: ubuntu-22.04
35+
steps:
36+
- uses: actions/checkout@v3
37+
- uses: actions/setup-python@v3
38+
- uses: pre-commit/[email protected]
39+

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
/build
12
/builddir
23
/.vscode
34
*.so
5+
__pycache__
6+
tmp
7+
*.log
8+
test_results.txt

.pre-commit-config.yaml

+74
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Redistribution and use in source and binary forms, with or without
4+
# modification, are permitted provided that the following conditions
5+
# are met:
6+
# * Redistributions of source code must retain the above copyright
7+
# notice, this list of conditions and the following disclaimer.
8+
# * Redistributions in binary form must reproduce the above copyright
9+
# notice, this list of conditions and the following disclaimer in the
10+
# documentation and/or other materials provided with the distribution.
11+
# * Neither the name of NVIDIA CORPORATION nor the names of its
12+
# contributors may be used to endorse or promote products derived
13+
# from this software without specific prior written permission.
14+
#
15+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
16+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
17+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
18+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
19+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
20+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
21+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
22+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
23+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26+
27+
repos:
28+
- repo: https://github.com/timothycrosley/isort
29+
rev: 5.12.0
30+
hooks:
31+
- id: isort
32+
additional_dependencies: [toml]
33+
- repo: https://github.com/psf/black
34+
rev: 23.1.0
35+
hooks:
36+
- id: black
37+
types_or: [python, cython]
38+
- repo: https://github.com/PyCQA/flake8
39+
rev: 5.0.4
40+
hooks:
41+
- id: flake8
42+
args: [--max-line-length=88, --select=C,E,F,W,B,B950, --extend-ignore = E203,E501]
43+
types_or: [python, cython]
44+
- repo: https://github.com/pre-commit/mirrors-clang-format
45+
rev: v16.0.5
46+
hooks:
47+
- id: clang-format
48+
types_or: [c, c++, cuda, proto, textproto, java]
49+
args: ["-fallback-style=none", "-style=file", "-i"]
50+
- repo: https://github.com/codespell-project/codespell
51+
rev: v2.2.4
52+
hooks:
53+
- id: codespell
54+
additional_dependencies: [tomli]
55+
args: ["--toml", "pyproject.toml"]
56+
exclude: (?x)^(.*stemmer.*|.*stop_words.*|^CHANGELOG.md$)
57+
# More details about these pre-commit hooks here:
58+
# https://pre-commit.com/hooks.html
59+
- repo: https://github.com/pre-commit/pre-commit-hooks
60+
rev: v4.4.0
61+
hooks:
62+
- id: check-case-conflict
63+
- id: check-executables-have-shebangs
64+
- id: check-merge-conflict
65+
- id: check-json
66+
- id: check-toml
67+
- id: check-yaml
68+
exclude: ^deploy(\/[^\/]+)*\/templates\/.*$
69+
- id: check-shebang-scripts-are-executable
70+
- id: end-of-file-fixer
71+
types_or: [c, c++, cuda, proto, textproto, java, python]
72+
- id: mixed-line-ending
73+
- id: requirements-txt-fixer
74+
- id: trailing-whitespace

CITATION.cff

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
cff-version: 1.2.0
2+
message: "If you use this software, please cite it as below."
3+
title: "Triton Inference Server: An Optimized Cloud and Edge Inferencing Solution."
4+
url: https://github.com/triton-inference-server
5+
repository-code: https://github.com/triton-inference-server/server
6+
authors:
7+
- name: "NVIDIA Corporation"

CMakeLists.txt

+49-18
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright 2020-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Redistribution and use in source and binary forms, with or without
44
# modification, are permitted provided that the following conditions
@@ -38,6 +38,7 @@ option(TRITON_ENABLE_TRACING "Include tracing support in server" OFF)
3838
option(TRITON_ENABLE_NVTX "Include NVTX support in server" OFF)
3939
option(TRITON_ENABLE_GPU "Enable GPU support in server" ON)
4040
option(TRITON_ENABLE_MALI_GPU "Enable Arm Mali GPU support in server" OFF)
41+
option(TRITON_IGPU_BUILD "Enable options for iGPU compilation in sever" OFF)
4142
set(TRITON_MIN_COMPUTE_CAPABILITY "6.0" CACHE STRING
4243
"The minimum CUDA compute capability supported by Triton" )
4344
set(TRITON_EXTRA_LIB_PATHS "" CACHE PATH "Extra library paths for Triton Server build")
@@ -54,6 +55,7 @@ option(TRITON_ENABLE_VERTEX_AI "Include Vertex AI API in server" OFF)
5455
# Metrics
5556
option(TRITON_ENABLE_METRICS "Include metrics support in server" ON)
5657
option(TRITON_ENABLE_METRICS_GPU "Include GPU metrics support in server" ON)
58+
option(TRITON_ENABLE_METRICS_CPU "Include CPU metrics support in server" ON)
5759

5860
# Cloud storage
5961
option(TRITON_ENABLE_GCS "Include GCS Filesystem support in server" OFF)
@@ -85,6 +87,10 @@ if(TRITON_ENABLE_TRACING AND NOT TRITON_ENABLE_STATS)
8587
message(FATAL_ERROR "TRITON_ENABLE_TRACING=ON requires TRITON_ENABLE_STATS=ON")
8688
endif()
8789

90+
if (TRITON_ENABLE_METRICS_CPU AND NOT TRITON_ENABLE_METRICS)
91+
message(FATAL_ERROR "TRITON_ENABLE_METRICS_CPU=ON requires TRITON_ENABLE_METRICS=ON")
92+
endif()
93+
8894
if (TRITON_ENABLE_METRICS_GPU AND NOT TRITON_ENABLE_METRICS)
8995
message(FATAL_ERROR "TRITON_ENABLE_METRICS_GPU=ON requires TRITON_ENABLE_METRICS=ON")
9096
endif()
@@ -113,6 +119,19 @@ FetchContent_Declare(
113119
GIT_TAG ${TRITON_THIRD_PARTY_REPO_TAG}
114120
)
115121

122+
# Some libs are installed to ${TRITON_THIRD_PARTY_INSTALL_PREFIX}/{LIB}/lib64 instead
123+
# of ${TRITON_THIRD_PARTY_INSTALL_PREFIX}/{LIB}/lib on Centos
124+
set (LIB_DIR "lib")
125+
# /etc/os-release does not exist on Windows
126+
if(EXISTS "/etc/os-release")
127+
file(STRINGS /etc/os-release DISTRO REGEX "^NAME=")
128+
string(REGEX REPLACE "NAME=\"(.*)\"" "\\1" DISTRO "${DISTRO}")
129+
message(STATUS "Distro Name: ${DISTRO}")
130+
if(DISTRO MATCHES "CentOS.*")
131+
set (LIB_DIR "lib64")
132+
endif()
133+
endif()
134+
116135
set(TRITON_CORE_HEADERS_ONLY OFF)
117136

118137
FetchContent_MakeAvailable(repo-third-party repo-core)
@@ -152,7 +171,16 @@ endif()
152171
if (WIN32)
153172
set(_FINDPACKAGE_PROTOBUF_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/protobuf/cmake")
154173
else()
155-
set(_FINDPACKAGE_PROTOBUF_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/protobuf/lib/cmake/protobuf")
174+
set(_FINDPACKAGE_PROTOBUF_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/protobuf/${LIB_DIR}/cmake/protobuf")
175+
endif()
176+
177+
# Triton with Opentelemetry is not supported on Windows
178+
# FIXME: add location for Windows, when support is added
179+
# JIRA DLIS-4786
180+
if (WIN32)
181+
set(_FINDPACKAGE_OPENTELEMETRY_CONFIG_DIR "")
182+
else()
183+
set(_FINDPACKAGE_OPENTELEMETRY_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/opentelemetry-cpp/${LIB_DIR}/cmake/opentelemetry-cpp")
156184
endif()
157185

158186
if (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
@@ -168,15 +196,15 @@ endif() # TRITON_ENABLE_GCS
168196
if(${TRITON_ENABLE_S3})
169197
set(TRITON_DEPENDS ${TRITON_DEPENDS} aws-sdk-cpp)
170198
endif() # TRITON_ENABLE_S3
171-
if(${TRITON_ENABLE_AZURE_STORAGE})
172-
set(TRITON_DEPENDS ${TRITON_DEPENDS} azure-storage-cpplite)
173-
endif() # TRITON_ENABLE_AZURE_STORAGE
174199
if(${TRITON_ENABLE_HTTP} OR ${TRITON_ENABLE_METRICS} OR ${TRITON_ENABLE_SAGEMAKER} OR ${TRITON_ENABLE_VERTEX_AI})
175200
set(TRITON_DEPENDS ${TRITON_DEPENDS} libevent libevhtp)
176201
endif() # TRITON_ENABLE_HTTP || TRITON_ENABLE_METRICS || TRITON_ENABLE_SAGEMAKER || TRITON_ENABLE_VERTEX_AI
177202
if(${TRITON_ENABLE_GRPC})
178203
set(TRITON_DEPENDS ${TRITON_DEPENDS} grpc)
179204
endif() # TRITON_ENABLE_GRPC
205+
if(NOT WIN32 AND ${TRITON_ENABLE_TRACING})
206+
set(TRITON_DEPENDS ${TRITON_DEPENDS} opentelemetry-cpp)
207+
endif() # TRITON_ENABLE_TRACING
180208

181209
ExternalProject_Add(triton-server
182210
PREFIX triton-server
@@ -189,21 +217,23 @@ ExternalProject_Add(triton-server
189217
${_CMAKE_ARGS_VCPKG_TARGET_TRIPLET}
190218
-DGTEST_ROOT:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/googletest
191219
-DgRPC_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/grpc/lib/cmake/grpc
192-
-Dc-ares_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/c-ares/lib/cmake/c-ares
193-
-Dabsl_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/absl/lib/cmake/absl
194-
-Dnlohmann_json_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/nlohmann_json/lib/cmake/nlohmann_json
220+
-Dc-ares_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/c-ares/${LIB_DIR}/cmake/c-ares
221+
-Dabsl_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/absl/${LIB_DIR}/cmake/absl
222+
-DCURL_DIR:STRING=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/curl/${LIB_DIR}/cmake/CURL
223+
-Dnlohmann_json_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/nlohmann_json/${LIB_DIR}/cmake/nlohmann_json
195224
-DLibevent_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/libevent/lib/cmake/libevent
196225
-Dlibevhtp_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/libevhtp/lib/cmake/libevhtp
197-
-Dstorage_client_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/lib/cmake/storage_client
198-
-Dazure-storage-cpplite_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/azure-storage-cpplite
199-
-Dgoogle_cloud_cpp_common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/lib/cmake/google_cloud_cpp_common
200-
-DCrc32c_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/crc32c/lib/cmake/Crc32c
201-
-DAWSSDK_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/cmake/AWSSDK
202-
-Daws-cpp-sdk-core_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/cmake/aws-cpp-sdk-core
203-
-Daws-cpp-sdk-s3_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/cmake/aws-cpp-sdk-s3
204-
-Daws-c-event-stream_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/aws-c-event-stream/cmake
205-
-Daws-c-common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/aws-c-common/cmake
206-
-Daws-checksums_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/aws-checksums/cmake
226+
-Dstorage_client_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/${LIB_DIR}/cmake/storage_client
227+
-Dgoogle_cloud_cpp_common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/${LIB_DIR}/cmake/google_cloud_cpp_common
228+
-DCrc32c_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/crc32c/${LIB_DIR}/cmake/Crc32c
229+
-DAWSSDK_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/cmake/AWSSDK
230+
-Daws-cpp-sdk-core_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/cmake/aws-cpp-sdk-core
231+
-Daws-cpp-sdk-s3_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/cmake/aws-cpp-sdk-s3
232+
-Daws-c-event-stream_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/aws-c-event-stream/cmake
233+
-Daws-c-common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/aws-c-common/cmake
234+
-Daws-checksums_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/aws-checksums/cmake
235+
-Dopentelemetry-cpp_DIR:PATH=${_FINDPACKAGE_OPENTELEMETRY_CONFIG_DIR}
236+
-DTRITON_IGPU_BUILD:BOOL=${TRITON_IGPU_BUILD}
207237
-DTRITON_THIRD_PARTY_REPO_TAG:STRING=${TRITON_THIRD_PARTY_REPO_TAG}
208238
-DTRITON_COMMON_REPO_TAG:STRING=${TRITON_COMMON_REPO_TAG}
209239
-DTRITON_CORE_REPO_TAG:STRING=${TRITON_CORE_REPO_TAG}
@@ -223,6 +253,7 @@ ExternalProject_Add(triton-server
223253
-DTRITON_MIN_COMPUTE_CAPABILITY:STRING=${TRITON_MIN_COMPUTE_CAPABILITY}
224254
-DTRITON_ENABLE_METRICS:BOOL=${TRITON_ENABLE_METRICS}
225255
-DTRITON_ENABLE_METRICS_GPU:BOOL=${TRITON_ENABLE_METRICS_GPU}
256+
-DTRITON_ENABLE_METRICS_CPU:BOOL=${TRITON_ENABLE_METRICS_CPU}
226257
-DTRITON_ENABLE_GCS:BOOL=${TRITON_ENABLE_GCS}
227258
-DTRITON_ENABLE_AZURE_STORAGE:BOOL=${TRITON_ENABLE_AZURE_STORAGE}
228259
-DTRITON_ENABLE_S3:BOOL=${TRITON_ENABLE_S3}

0 commit comments

Comments
 (0)