Releases · NVIDIA/NeMo

25 May 22:04

ko3n1g

v2.3.1

abddc85

NVIDIA Neural Modules 2.3.1 Latest

Latest

Highlights

Collections
- LLM
  - Llama 4: Fixed an accuracy issue caused by MoE probability normalization. Improved pre-train and fine-tune performance.
Export & Deploy
- Updated vLLMExporter to use vLLM V1 to address a security vulnerability.
AutoModel
- Improved chat-template handling.
Fault Tolerance
- Local checkpointing: Fixed support for auto-inserted metric names for resuming from local checkpoints.

Detailed Changelogs:

Export

Changelog

Cherry-pick Update vLLMExporter to use vLLM V1 (#13498) into r2.3.0 by @chtruong814 :: PR: #13631

Uncategorized:

Changelog

Bump to 2.3.1 by @chtruong814 :: PR: #13507
Cherry pick Use explicitly cached canary-1b-flash in CI tests (13237) into r2.3.0 by @ko3n1g :: PR: #13508
Cherry pick [automodel] bump liger-kernel to 0.5.8 + fallback (13260) into r2.3.0 by @ko3n1g :: PR: #13308
Cherry-pick Add recipe and ci scripts for qwen2vl to r2.3.0 by @romanbrickie :: PR: #13336
Cherry pick Fix skipme handling (13244) into r2.3.0 by @ko3n1g :: PR: #13376
Cherry pick Allow fp8 param gather when using FSDP (13267) into r2.3.0 by @ko3n1g :: PR: #13383
Cherry pick Handle boolean args for performance scripts and log received config (13291) into r2.3.0 by @ko3n1g :: PR: #13416
Cherry pick new perf configs (13110) into r2.3.0 by @ko3n1g :: PR: #13431
Cherry pick Adding additional unit tests for the deploy module (13411) into r2.3.0 by @ko3n1g :: PR: #13449
Cherry pick Adding more export tests (13410) into r2.3.0 by @ko3n1g :: PR: #13450
Cherry pick [automodel] add FirstRankPerNode (13373) into r2.3.0 by @ko3n1g :: PR: #13559
Cherry pick [automodel] deprecate global_batch_size dataset argument (13137) into r2.3.0 by @ko3n1g :: PR: #13560
Cherry-pick [automodel] fallback FP8 + LCE -> FP8 + CE (#13349) into r2.3.0 by @chtruong814 :: PR: #13561
Cherry pick [automodel] add find_unused_parameters=True for DDP (13366) into r2.3.0 by @ko3n1g :: PR: #13601
Cherry pick Add CI test for local checkpointing (#13012) into r2.3.0 by @ananthsub :: PR: #13472
Cherry pick [automodel] fix --mbs/gbs dtype and chat-template (13598) into r2.3.0 by @akoumpa :: PR: #13613
Cherry-pick Update t5.py (#13082) to r2.3.0 and bump mcore to f98b1a0 by @chtruong814 :: PR: #13642
[Automodel] Fix CP device_mesh issue, use PTL distsampler (#13473) by @akoumpa :: PR: #13636
[Llama4] Fix the recipe bug - cherrypick #13649 by @gdengk :: PR: #13650
build: Pin transformers (#13675) by @ko3n1g :: PR: #13692

Contributors

ananthsub, romanbrickie, and 4 other contributors

Assets 2

08 May 23:42

ko3n1g

v2.3.0

2b03b74

NVIDIA Neural Modules 2.3.0

Highlights

Export & Deploy
- NeMo 2.0 export path for NIM
- ONNX and TensorRT Export for NIM Embedding Container
- In-framework deployment for HF Models
- TRT-LLM deployment for HF Models in NeMo Framework
Evaluation
- Integrate nvidia-lm-eval to NeMo FW for evaluations with OpenAI API compatible in-framework deployment
AutoModel
- VLM AutoModelForImageForTextToText
- FP8 for AutoModel
- Support CP with FSDP2
- Support TP with FSDP2
- Performance Optimization
  - add support for cut cross entropy & liger kernel
  - Gradient Checkpointing
Fault Tolerance
- Integrate NVRx v0.3 Local checkpointing
Collections
- LLM
  - Llama4
  - Llama Nemotron Ultra
  - Llama Nemotron Super
  - Llama Nemotron Nano
  - Nemotron-h/5
  - DeepSeek V3 Pretraining
  - Evo2
  - Qwen 2.5
  - LoRA for Qwen3-32B and Qwen3-30B-A3B
- MultiModal
  - FLUX
  - Gemma 3
  - Qwen2-VL
- ASR
  - NeMo Run support for ASR training
  - N-Gram LM on GPU for AED
  - N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT)
  - Timestamps support for AED timestamp supported models
  - Migrate SpeechLM to NeMo 2.0
  - Canary-1.1
  - Replace ClassificationModels class with LabelModels
Performance
- Functional MXFP8 support for (G)B200
- Current scaling recipe with TP communication overlap and FP8 param gathers
- Custom FSDP support that fully utilizes GB200 NVL72

Detailed Changelogs:

ASR

Changelog

Added model config params for Canary-1B-Flash, Canary-180M-Flash models by @KunalDhawan :: PR: #12588
Canary tutorial by @ankitapasad :: PR: #12613
Canary tutorial fix timestamp by @ankitapasad :: PR: #12677
revert config by @nithinraok :: PR: #12689
canary longform inference script with timestamps option by @krishnacpuvvada :: PR: #12653
Fix default timestamps value for Hybrid ASR models by @artbataev :: PR: #12681
Fix k2 installation with PyTorch 2.6.0 by @artbataev :: PR: #12686
Improve time and RTFx report for ASR by @artbataev :: PR: #12680
Modify train args by @ankitapasad :: PR: #12700
Fix asr doc warnings by @nithinraok :: PR: #12720
Rename FastNGramLM -> NGramGPULanguageModel by @artbataev :: PR: #12755
transcribe fix for new hypotheses by @nune-tadevosyan :: PR: #12801
Fix timestamps when cuda graphs enabled by @monica-sekoyan :: PR: #12808
update streaming conformer by @stevehuang52 :: PR: #12846
AED Decoding with N-Gram LM by @artbataev :: PR: #12730
update notebook by @nithinraok :: PR: #13088
bugfix ASR_Context_Biasing.ipynb by @lilithgrigoryan :: PR: #13109
Change branch for installation from main to r2.3.0 by @ankitapasad :: PR: #13266

TTS

Changelog

Add Magpie-TTS and Updates NeMo Audio Codecs by @blisc :: PR: #12606
fix bug from prior commit (#13264) by @blisc :: PR: #13328

NLP / NMT

Changelog

Remove old peft docs by @cuichenx :: PR: #12675
Add code coverage for llm gpt models conversion tests by @suiyoubi :: PR: #12665
Make BERT TransformerBlockWithPostLNSupport accept more inputs from Mcore by @suiyoubi :: PR: #12685
remove gifs from documentation by @dimapihtar :: PR: #12732
Rename FastNGramLM -> NGramGPULanguageModel by @artbataev :: PR: #12755
fix NeMo documentation by @dimapihtar :: PR: #12754
GPT Model/Data/Recipe Unit Test by @suiyoubi :: PR: #12757
ci: Exclude nlp, mm, vision collections by @ko3n1g :: PR: #12816
Add vocab size as attr to GPT and T5 Configs, use file name based logger in llm.gpt.data by @hemildesai :: PR: #12862
Fix transformer layer api with megatron cbc89b3 by @yaoyu-33 :: PR: #12885

Text Normalization / Inverse Text Normalization

Changelog

Rename FastNGramLM -> NGramGPULanguageModel by @artbataev :: PR: #12755

Export

Changelog

GHA Conversion Test and Importer/Exporter Refactor by @suiyoubi :: PR: #12597
Fix Llama Embedding Model Exporting keys by @suiyoubi :: PR: #12691
build: Add trtllm by @ko3n1g :: PR: #12672
Fix trt-llm install by @chtruong814 :: PR: #12827
Update LLaVA's next HF exporter to load ViT checkpoint from YAML by @eagle705 :: PR: #12841
Support huggingface export to tensorrtllm by @pthombre :: PR: #12889
Adds a built stage for the trt-llm wheel to reduce the overall test image size by @chtruong814 :: PR: #12883

Uncategorized:

Changelog

Update changelog-build.yml by @ko3n1g :: PR: #12584
Update changelog for r2.2.0 by @github-actions[bot] :: PR: #12585
Add comments for requirements by @thomasdhc :: PR: #12603
[automodel] FSDP2Strategy: move to device if using a single-device by @akoumpa :: PR: #12593
build: Remove numba pin by @ko3n1g :: PR: #12604
docs: Update installation guides by @ko3n1g :: PR: #12596
Change Llama Scaling Factor type to Float by @suiyoubi :: PR: #12616
ci: Test multiple python versions by @ko3n1g :: PR: #12619
ci: Disable reformat by @ko3n1g :: PR: #12620
Updating ModelOpt to 0.25.0 by @janekl :: PR: #12633
[automodel] add additional hf_dataset tests by @akoumpa :: PR: #12646
[automodel] add jit_transform tests by @akoumpa :: PR: #12645
[automodel] init eos_token_id inside data module by @yuanzhedong :: PR: #12610
[automodel] grad ckpt by @akoumpa :: PR: #12644
bugfix(llm/LLaMa) - dropout_position can never be equal to extended string by @soluwalana :: PR: #12649
Fix inference pipeline quality issue by @Victor49152 :: PR: #12639
[automodel] switch to direct=True to propage return codes in nemorun by @akoumpa :: PR: #12651
add Auto Conf support for bert, t5, qwen, starcoder models by @dimapihtar :: PR: #12601
ci: Upload coverage by @ko3n1g :: PR: #12668
ci: Re-enable changed-files action by @ko3n1g :: PR: #12683
build: Pin sox by @ko3n1g :: PR: #12701
add neva quantization by @linnanwang :: PR: #12698
Clip coverage by @abhinavg4 :: PR: #12696
GHA CI test: Remove unnecessary directive by @pablo-garay :: PR: #12714
minor perf fixes by @malay-nagda :: PR: #12656
Add DeepSeek V2 Lite into llm init.py by @suiyoubi :: PR: #12664
Add Llama-Nemotron Nano and 70B models by @suiyoubi :: PR: #12712
Save batch norm running stats in PEFT checkpoints by @cuichenx :: PR: #12666
Fix document Readme under nemo to add more information by @yaoyu-33 :: PR: #12699
Fix ub_overlap_ag by @cuichenx :: PR: #12721
Toggle fast tokenizer if error occurs by @cuichenx :: PR: #12722
Update README.md for blackwell and AutoModel by @snowmanwwg :: PR: #12612
Raise error on import_ckpt with overwrite=False plus README for checkpoint_converters by @janekl :: PR: #12693
[automodel] fix validation_step by @soluwalana :: PR: #12659
[automodel] vlm tests by @akoumpa :: PR: #12716
Auto Configurator code coverage by @dimapihtar :: PR: #12694
[automodel] fix automodle benchmark script by @yuanzhedong :: PR: #12605
Remove unnecessary directives by @pablo-garay :: PR: #12743
Add recipe tests for coverage by @cuichenx :: PR: #12737
Add Qwen2.5 in NeMo2 by @suiyoubi :: PR: #12731
add fallback_module to safe_import_from by @akoumpa :: PR: #12726
Update quantization scripts & relax modelopt requirement specifier by @janekl :: PR: #12709
Import guard fasttext by @thomasdhc :: PR: #12758
[automodel] chunked cross entropy by @akoumpa :: PR: #12752
Add fsdp automodel test by @BoxiangW :: PR: #12718
[automodel] if peft move only adapters to cpu by @akoumpa :: PR: #12735
[automodel] update hf mockdataset by @akoumpa :: PR: #12643
[automodel] remove unused cell in multinode notebook by @yuanzhedong :: PR: #12624
Yash/llava next coverage by @yashaswikarnati :: PR: #12745
Tidy code: remove unneeded statements/lines by @pablo-garay :: PR: #12771
Pass tensor instead of raw number in _mock_loss_function in PTQ by @janekl :: PR: #12769
ci: Run on nightly schedule by @ko3n1g :: PR: #12775
Add logs for checkpoint saving start and finalization by @lepan-google :: PR: #12697
Alit/test coverage by @JRD971000 :: PR: #12762
Fix loss mask with packed sequence by @ashors1 :: PR: #12642
Add pruning recipe by @kevalmorabia97 :: PR: #12602
Update qwen2-v1 to use NeMo quick_gelu by @thomasdhc :: PR: #12787
[doc] Fixes for audio doc warnings by @anteju :: PR: #12736
ci: Measure multiprocessing by @ko3n1g :: PR: #12778
ci: Fix flaky LLM tests by @ko3n1g :: PR: #12807
Add BERT/Qwen2.5 Unit test and Refactor all GHA Conversion Tests by @suiyoubi :: PR: #12785
Fix TransformerBlock cuda_graphs compatibility with MCore by @buptzyb :: PR: #12779
ci: Remove --branch by @ko3n1g :: PR: #12809
ci: Move scripts fully down to files by @ko3n1g :: PR: #12802
add init.py to make this a package by @akoumpa :: PR: #12814
Update changelog for r2.2.1 by @github-actions[bot] :: PR: #12818
add finetune support for Auto Configurator by @dimapihtar :: PR: #12770
[automodel] add cpu:gloo to backend by @akoumpa :: PR: #12832
add missing call to _apply_liger_kernel_to_instance by @akoumpa :: PR: #12806
Prune docker images in GHA older than 8hrs by @chtruong814 :: PR: #12838
[audio] Adding tests for predictive models by @anteju :: PR: #12823
Update resiliency example notebook readme and add links to the brev launchable by @ShriyaRishab :: PR: #12843
[automodel] qlora peft by @yzhang123 :: PR: #12817
ci: Increase prune time by @ko3n1g :: PR: #12860
Update base container in Dockerfile.speech by @artbataev :: PR: #12859
Fix qwen2.5 1.5b configuration inheritance bug by @Aprilistic :: PR: #12852
Update modelopt upperbound to 0.27 by @thomasdhc :: PR: #12788
Non-bloc...

Contributors

jstjohn, soluwalana, and 45 other contributors

Assets 3

21 Apr 23:24

ko3n1g

v2.3.0rc4

b9abb0a

NVIDIA Neural Modules 2.3.0rc4 Pre-release

Pre-release

Prerelease: NVIDIA Neural Modules 2.3.0rc4 (2025-04-21)

Assets 2

15 Apr 18:22

ko3n1g

v2.3.0rc3

3d04c86

NVIDIA Neural Modules 2.3.0rc3 Pre-release

Pre-release

Prerelease: NVIDIA Neural Modules 2.3.0rc3 (2025-04-15)

Assets 2

07 Apr 21:36

ko3n1g

v2.3.0rc2

9ff7e75

NVIDIA Neural Modules 2.3.0rc2 Pre-release

Pre-release

Prerelease: NVIDIA Neural Modules 2.3.0rc2 (2025-04-07)

Assets 2

31 Mar 21:31

ko3n1g

v2.2.1

132f217

NVIDIA Neural Modules 2.2.1

Highlights

Training
- Fix MoE based models training instability.
- Fix bug in Llama exporter for Llama 3.2 1B and 3B.
- Fix bug in LoRA linear_fc1adapter when different TP is used during saving and loading the adapter checkpoint.

Detailed Changelogs:

Uncategorized:

Changelog

Re-add reverted commits after 2.2.0 and set next version to be 2.2.1 by @chtruong814 :: PR: #12587
Cherry pick Fix exporter for llama models with shared embed and output layers (12545) into r2.2.0 by @ko3n1g :: PR: #12608
Cherry pick Fix TP for LoRA adapter on linear_fc1 (12519) into r2.2.0 by @ko3n1g :: PR: #12607
Bump mcore to use 0.11.1 by @chtruong814 :: PR: #12634

Contributors

ko3n1g and chtruong814

Assets 2

12 Mar 20:30

ko3n1g

v2.2.0

7192a2c

NVIDIA Neural Modules 2.2.0

Highlights

Training
- Blackwell and Grace Blackwell support
- Pipeline parallel support for distillation
- Improved NeMo Framework installation
Export & Deploy
- vLLM export for NeMo 2.0
Evaluations
- Integrate lm-eval-harness
Collections
- LLM
  - DAPT Example and best practices in nemo 2.0
  - [NeMo 2.0] Enable Tool Learning and add a tutorial
  - Support GPT Embedding Model (Llama 3.2 1B/3B)
  - Qwen2.5, Phi4 (via AutoModel)
  - SFT for Llama 3.3 model (via AutoModel)
  - Support BERT Embedding Model with NeMo 2.0
  - DeepSeek SFT & PEFT Support
- MultiModal
  - Clip
  - SP for NeVA
  - CP for NeVA
  - Intern-VIT
Automodel
- Preview release.
- PEFT and SFT support for LLMs available via Hugging Face’s AutoModelForCausalLM.
- Support for Hugging Face-native checkpoints (full model and adapter only).
- Support for distributed training via DDP and FSDP2.
ASR/TTS
- Lhotse: TPS-free 2D bucket estimation and filtering
- Update model outputs to make all asr outputs to be in consistent format
- Sortformer Release Model

Detailed Changelogs:

ASR

Changelog

removed the line which caused a problem in nfa_tutorial by @Ssofja :: PR: #11710
TPS-free 2D bucket estimation and filtering by @pzelasko :: PR: #11738
Update transcribe_utils.py by @stevehuang52 :: PR: #11984
Sortformer Diarizer 4spk v1 model PR Part 4: Sortformer Documents and Notebook Tutorials by @tango4j :: PR: #11707
fix the issue during batched inference of Sortformer diarizer by @tango4j :: PR: #12047
changed asr models outputs to be consistent by @Ssofja :: PR: #11818
chore: Update notebooks by @ko3n1g :: PR: #12161
add ctc segmentation by @ko3n1g :: PR: #12312
clean up VAD tutorial by @stevehuang52 :: PR: #12410
copy from main by @nithinraok :: PR: #12423
ci: Disable ASR tests for now (#12443) by @ko3n1g :: PR: #12466
ASR_CTC_Language_Finetuning.ipynb bugfix by @lilithgrigoryan :: PR: #12538

TTS

Changelog

Add New Transformer Backbone for TTS Models by @blisc :: PR: #11911
changed asr models outputs to be consistent by @Ssofja :: PR: #11818
chore: Update notebooks by @ko3n1g :: PR: #12161

NLP / NMT

Changelog

Use explicit imports from megatronllm_deployable.py by @janekl :: PR: #11705
Bug fix minor bug in TRT-LLM deployment by @oyilmaz-nvidia :: PR: #11714
gpt moe perf scripts by @malay-nagda :: PR: #11760
Bump mcore by @ko3n1g :: PR: #11740
Enable packed seqs for validation by @jiemingz :: PR: #11748
Revert Mcore update since it caused regression by @pablo-garay :: PR: #11791
Fix Gemma2 Attention Init Args by @suiyoubi :: PR: #11792
Add null tokenizer by @erhoo82 :: PR: #11789
Fix DistCP inference issue by @suiyoubi :: PR: #11801
Add BERT Embedding Models E5 Recipe by @suiyoubi :: PR: #11787
Add rope scaling configs for NeMo 1 by @BoxiangW :: PR: #11807
Fix calculating num_available_samples by @huvunvidia :: PR: #11830
fix sentencepiece tokenizer special tokens by @akoumpa :: PR: #11811
add chat sft dataset to support agent tool calling by @chenrui17 :: PR: #11759
Revert "Revert Mcore update since it caused regression (#11791)" by @ko3n1g :: PR: #11799
fix checkpoint load issue by @dimapihtar :: PR: #11859
Fix nemo 1 packed sequence TE version error by @cuichenx :: PR: #11874
enable loading older TE checkpoints by @dimapihtar :: PR: #11930
ci: Use single runner machines for unit tests by @ko3n1g :: PR: #11937
llm performance scripts by @malay-nagda :: PR: #11736
[MoE] add expert tensor parallelism support for NeMo2.0 MoE by @gdengk :: PR: #11880
add exception when loading ckpt saved by TE < 1.13 by @dimapihtar :: PR: #11988
remove renormalize_blend_weights flag by @dimapihtar :: PR: #11975
Llama3.2 1B Embedding Model Support by @suiyoubi :: PR: #11909
Weekly bump by @ko3n1g :: PR: #11896
Debug Apex distributed optimizer to handle Transformer Engine 2.0 by @timmoon10 :: PR: #12004
throw MegatronOptimizerModule warning only with mcore models by @akoumpa :: PR: #12085
fix nmt dataclass issue by @dimapihtar :: PR: #12081
Propogate dp last changes from mcore by @ryantwolf :: PR: #12012
Add error message when downloading failed. by @yuanzhedong :: PR: #12139
interface for asymmetric pipeline schedule by @erhoo82 :: PR: #12039
chore: Update notebooks by @ko3n1g :: PR: #12161
Cherrypick #12382, #12415 and #12424 by @cuichenx :: PR: #12425
ASR_CTC_Language_Finetuning.ipynb bugfix by @lilithgrigoryan :: PR: #12538

Text Normalization / Inverse Text Normalization

Changelog

surface attn_implementation option by @akoumpa :: PR: #11873
attn_implementation eager fallback by @akoumpa :: PR: #12060

NeMo Tools

Changelog

build: Add sox to SDE by @ko3n1g :: PR: #11882
add ctc segmentation by @ko3n1g :: PR: #12312

Export

Changelog

Bug fix minor bug in TRT-LLM deployment by @oyilmaz-nvidia :: PR: #11714
In-framework deployment NeMo 2.0 nemo_export.py test by @janekl :: PR: #11749
Fix starcoder2 missing bias in nemo2 config for TRTLLM by @meatybobby :: PR: #11809
Autodetect dtype on exporting to TensorRT-LLM by @janekl :: PR: #11907
PTQ & TRT-LLM updates related to upcoming PyTorch 25.01 bump by @janekl :: PR: #11941
Run Flake8 for nemo.export module by @janekl :: PR: #11728
Skip initialization in hf export by @cuichenx :: PR: #12136
update export io call by @akoumpa :: PR: #12144
add default kwargs for trtllm model runner by @pablo-garay :: PR: #12248
cherry-pick: fix[export]: reshard model correctly handles extra_state when it's a tensor (#12132) by @terrykong :: PR: #12335

Bugfixes

Changelog

added required instalation for sox to process mp3 file by @Ssofja :: PR: #11709
removed the line which caused a problem in nfa_tutorial by @Ssofja :: PR: #11710
Bug fix minor bug in TRT-LLM deployment by @oyilmaz-nvidia :: PR: #11714

Uncategorized:

Changelog

Allow using vocab size from config by @shanmugamr1992 :: PR: #11718
Fix baseline recipes by @erhoo82 :: PR: #11725
Update changelog for r2.1.0 by @github-actions[bot] :: PR: #11745
ci: Fix changelog generator by @ko3n1g :: PR: #11744
Fix 'http_port' parameter name in DeployPyTriton usages and update .qnemo compress=True path by @janekl :: PR: #11747
Conversion NeMo and HF checkpoint script for T5 by @huvunvidia :: PR: #11739
Add BERT Embedding Models by @suiyoubi :: PR: #11737
Add server ready check before starting evaluation by @athitten :: PR: #11731
only install bitsandbytes on x86 by @akoumpa :: PR: #11781
[Bugfix] Skip processing if extra_state loads as None by @janekl :: PR: #11778
chore(beep boop 🤖): Bump MCORE_TAG=4dc8977... (2025-01-07) by @ko3n1g :: PR: #11768
make progress printer compatible with PTL v2.5.0 by @ashors1 :: PR: #11779
Fix Mistral Conversion Issue by @suiyoubi :: PR: #11786
build: Fix build-arg by @ko3n1g :: PR: #11815
Lora ckpt in HF format for NeMo AutoModel by @oyilmaz-nvidia :: PR: #11712
8x22b seq len by @malay-nagda :: PR: #11788
Bugfix for output_generation_logits in tensorrtllm by @athitten :: PR: #11820
handle mistralai/Mistral-7B-Instruct-v0.3 tokenizer correctly by @akoumpa :: PR: #11839
remove tensorstore pin in requirements*.txt by @pstjohn :: PR: #11777
Do not load context for model transform in llm inference by @hemildesai :: PR: #11751
update nemo2sftpeft tutorial container verison by @HuiyingLi :: PR: #11832
Latest News updated for Cosmos by @lbliii :: PR: #11806
Removes tensorstore 0.1.45 pin from requirements_deploy.txt by @pstjohn :: PR: #11858
ci: Prune dangling images by @ko3n1g :: PR: #11885
Disable tests that download datasets from web by @akoumpa :: PR: #11878
Add context_logits for eval accuracy calculation in case of multi token prediction tasks by @athitten :: PR: #11753
add dataset_root to SpecterDataModule by @suiyoubi :: PR: #11837
Support both Path and str for APIs by @maanug-nv :: PR: #11865
Run nsys callback on GBS not on MBS by @akoumpa :: PR: #11861
ci: Set bump-branch to weekly by @ko3n1g :: PR: #11889
chore: Update mcore-tag-bump-bot.yml by @ko3n1g :: PR: #11891
ci: Bump Mcore in weekly PR by @ko3n1g :: PR: #11897
check restore_config first by @akoumpa :: PR: #11890
LinearAdapter: propagate args to _init_adapter by @akoumpa :: PR: #11902
NeMo 2.0 fp8 conversion by @Laplasjan107 :: PR: #11845
nemo ux expert tensor parallel by @akoumpa :: PR: #11903
Add CP support to Neva in NeMo2 by @yaoyu-33 :: PR: #11850
build: Move dependencies by @ko3n1g :: PR: #11790
Add Flux and Flux Controlnet Support to Diffusion folder by @Victor49152 :: PR: #11794
ci: Adjust bump mcore workflow by @ko3n1g :: PR: #11918
ci: Small fix to bump workflow by @ko3n1g :: PR: #11919
Revert #11890 and add a test that would have caught the error by @cuichenx :: PR: #11914
ci: Adjust input argument by @ko3n1g :: PR: #11921
Create test_phi3.py by @mayani-nv :: PR: #11843
Enable NeMo importer and loading dist CKPT for training by @Victor49152 :: PR: #11927
build: Pin triton by @ko3n1g :: PR: #11938
Add sharding for speechlm and vlm by @BoxiangW :: PR: #11876
Update torch load for load from disk by @thomasdhc :: PR: #11963
Add options to add mp_policy and parallel_fn for NeMo automodel fsdp2 by @BoxiangW :: PR: #11956
ci: Add coverage reports by @ko3n1g :: PR: #11912
Add batching support for evaluation by @athitten :: PR: #11934
add use_fast option by @akoumpa :: PR: #11976
improve error and debug messages in model connector by @cuichenx :: PR: #11979
[checkpoint][docs] Fix typos in dist checkpointing docs by @ananthsub :: PR: #1...