Weekly release: 0.19.0.dev2025040100 #3204

kaiyux · 2025-04-01T14:43:56Z

kaiyux
Apr 1, 2025
Maintainer

Hi,

The TensorRT-LLM team is pleased to announce that we have updated a weekly release 0.19.0.dev2025040100, and pushed an update to the Triton backend this April 1, 2025.

The 0.19.0.dev2025040100 dev release includes:

Model Support
- Added EXAONE-Deep support. Refer to examples/exaone/README.md. (feat: Add EXAONE-Deep #3054)
- Added one-shot version for UserBuffer AllReduce-Normalization on FP16/BF16. (feat: Add one-shot version for UB AR NORM FP16/BF16 #2995)
- Added initial EAGLE-3 implementation. (feat: Add initial EAGLE-3 implementation #3035)
Features
- Added PyTorch pipeline parallelism with attention DP support. (feat: Pytorch PP + attention DP support #3044)
- Added support for enabling MTP with CUDA graph padding. (feat: add support for MTP+cuda_graph_padding #3096)
- Added request BW metric measurement for disaggServerBenchmark. (feat: Add BW measurement #3070)
- Updated logits bitmask kernel to v3 (feat: Update logits bitmask kernel to v3 #3009)
- Supported AutoDeploy as a backend for trtllm-bench command (perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench #3041)
- Enabled CUDA graphs when attention DP was used and active requests on different GPUs were uneven (perf: Enable CUDA graphs when attention DP is used and active requests on different GPUs are uneven #3010)
- Added iteration log support for trtllm-bench (perf: Readd iteration logging for trtllm-bench. #3039)
API
- Moved BuildConfig arguments to LlmArgs. (chore: [TRTLLM-3694] Move functional args to llmargs #3036)
- Removed speculative decoding parameters from stateful decoders (refactor: Remove speculative decoding parameters from stateful decoders #3024)
Bug fixes
- Fixed the missing parameter for the WeightOnlyQuantRowLinear module (fix: Add missing parameter for WeightOnlyQuantRowLinear module #2768)
- Fixed the single-node quitting issue on SLURM (fix: fix single-node cannot quit issue on slurm #3140)
- Fixed early exit in CMake when find_library() did not find any library (fix: Early exit cmake if find_library() does not find any lib #3113)
- Fixed a hang in MGMN cases when using the trtllm-llmapi-launch command (fix: fix hang in mgmn with trtllm-llmapi-launch command #3119)
- Fixed gpus_per_node issue in trtllm-bench when world_size is less than device_count (fix: gpus_per_node in trtllm-bench when world_size < device_count #3007)
- Fixed misaligned fuse messages across different processes (Fix: fuse message not aligned on different processes #3067)
- Fixed the issue when cp_size is greater than kvHeadNum (fix: fix for cp > kvHeadNum #3002)
- Fixed incorrect draft_token_nums for dummy requests during torch compilation with MTP (fix: Set correct draft_token_nums to dummy requests for torch compilation with MTP #3053)
- Fixed logits dtype in assertion (chore: Fix logits dtype in assert #3038)

The cut-off commit to this release is 7549573. The code changes can be seen here: c2ffce7...7549573.

Thanks,
The TensorRT-LLM Engineering Team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weekly release: 0.19.0.dev2025040100 #3204

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Weekly release: 0.19.0.dev2025040100 #3204

kaiyux Apr 1, 2025 Maintainer

Replies: 0 comments

kaiyux
Apr 1, 2025
Maintainer