Releases: NVIDIA/NeMo
Releases · NVIDIA/NeMo
NVIDIA Neural Modules 2.3.1
Highlights
- Collections
- LLM
- Llama 4: Fixed an accuracy issue caused by MoE probability normalization. Improved pre-train and fine-tune performance.
- LLM
- Export & Deploy
- Updated vLLMExporter to use vLLM V1 to address a security vulnerability.
- AutoModel
- Improved chat-template handling.
- Fault Tolerance
- Local checkpointing: Fixed support for auto-inserted metric names for resuming from local checkpoints.
Detailed Changelogs:
Export
Changelog
- Cherry-pick
Update vLLMExporter to use vLLM V1
(#13498) intor2.3.0
by @chtruong814 :: PR: #13631
Uncategorized:
Changelog
- Bump to 2.3.1 by @chtruong814 :: PR: #13507
- Cherry pick
Use explicitly cached canary-1b-flash in CI tests (13237)
intor2.3.0
by @ko3n1g :: PR: #13508 - Cherry pick
[automodel] bump liger-kernel to 0.5.8 + fallback (13260)
intor2.3.0
by @ko3n1g :: PR: #13308 - Cherry-pick
Add recipe and ci scripts for qwen2vl
tor2.3.0
by @romanbrickie :: PR: #13336 - Cherry pick
Fix skipme handling (13244)
intor2.3.0
by @ko3n1g :: PR: #13376 - Cherry pick
Allow fp8 param gather when using FSDP (13267)
intor2.3.0
by @ko3n1g :: PR: #13383 - Cherry pick
Handle boolean args for performance scripts and log received config (13291)
intor2.3.0
by @ko3n1g :: PR: #13416 - Cherry pick
new perf configs (13110)
intor2.3.0
by @ko3n1g :: PR: #13431 - Cherry pick
Adding additional unit tests for the deploy module (13411)
intor2.3.0
by @ko3n1g :: PR: #13449 - Cherry pick
Adding more export tests (13410)
intor2.3.0
by @ko3n1g :: PR: #13450 - Cherry pick
[automodel] add FirstRankPerNode (13373)
intor2.3.0
by @ko3n1g :: PR: #13559 - Cherry pick
[automodel] deprecate global_batch_size dataset argument (13137)
intor2.3.0
by @ko3n1g :: PR: #13560 - Cherry-pick
[automodel] fallback FP8 + LCE -> FP8 + CE
(#13349) intor2.3.0
by @chtruong814 :: PR: #13561 - Cherry pick
[automodel] add find_unused_parameters=True for DDP (13366)
intor2.3.0
by @ko3n1g :: PR: #13601 - Cherry pick
Add CI test for local checkpointing (#13012)
intor2.3.0
by @ananthsub :: PR: #13472 - Cherry pick
[automodel] fix --mbs/gbs dtype and chat-template (13598)
intor2.3.0
by @akoumpa :: PR: #13613 - Cherry-pick
Update t5.py
(#13082) tor2.3.0
andbump mcore to f98b1a0
by @chtruong814 :: PR: #13642 - [Automodel] Fix CP device_mesh issue, use PTL distsampler (#13473) by @akoumpa :: PR: #13636
- [Llama4] Fix the recipe bug - cherrypick #13649 by @gdengk :: PR: #13650
- build: Pin transformers (#13675) by @ko3n1g :: PR: #13692
NVIDIA Neural Modules 2.3.0
Highlights
- Export & Deploy
- NeMo 2.0 export path for NIM
- ONNX and TensorRT Export for NIM Embedding Container
- In-framework deployment for HF Models
- TRT-LLM deployment for HF Models in NeMo Framework
- Evaluation
- Integrate nvidia-lm-eval to NeMo FW for evaluations with OpenAI API compatible in-framework deployment
- AutoModel
- VLM AutoModelForImageForTextToText
- FP8 for AutoModel
- Support CP with FSDP2
- Support TP with FSDP2
- Performance Optimization
- add support for cut cross entropy & liger kernel
- Gradient Checkpointing
- Fault Tolerance
- Integrate NVRx v0.3 Local checkpointing
- Collections
- LLM
- Llama4
- Llama Nemotron Ultra
- Llama Nemotron Super
- Llama Nemotron Nano
- Nemotron-h/5
- DeepSeek V3 Pretraining
- Evo2
- Qwen 2.5
- LoRA for Qwen3-32B and Qwen3-30B-A3B
- MultiModal
- FLUX
- Gemma 3
- Qwen2-VL
- ASR
- NeMo Run support for ASR training
- N-Gram LM on GPU for AED
- N-Gram LM on GPU + Transducer greedy decoding (RNN-T, TDT)
- Timestamps support for AED timestamp supported models
- Migrate SpeechLM to NeMo 2.0
- Canary-1.1
- Replace ClassificationModels class with LabelModels
- LLM
- Performance
- Functional MXFP8 support for (G)B200
- Current scaling recipe with TP communication overlap and FP8 param gathers
- Custom FSDP support that fully utilizes GB200 NVL72
Detailed Changelogs:
ASR
Changelog
- Added model config params for Canary-1B-Flash, Canary-180M-Flash models by @KunalDhawan :: PR: #12588
- Canary tutorial by @ankitapasad :: PR: #12613
- Canary tutorial fix timestamp by @ankitapasad :: PR: #12677
- revert config by @nithinraok :: PR: #12689
- canary longform inference script with timestamps option by @krishnacpuvvada :: PR: #12653
- Fix default timestamps value for Hybrid ASR models by @artbataev :: PR: #12681
- Fix k2 installation with PyTorch 2.6.0 by @artbataev :: PR: #12686
- Improve time and RTFx report for ASR by @artbataev :: PR: #12680
- Modify train args by @ankitapasad :: PR: #12700
- Fix asr doc warnings by @nithinraok :: PR: #12720
- Rename
FastNGramLM
->NGramGPULanguageModel
by @artbataev :: PR: #12755 - transcribe fix for new hypotheses by @nune-tadevosyan :: PR: #12801
- Fix timestamps when cuda graphs enabled by @monica-sekoyan :: PR: #12808
- update streaming conformer by @stevehuang52 :: PR: #12846
- AED Decoding with N-Gram LM by @artbataev :: PR: #12730
- update notebook by @nithinraok :: PR: #13088
- bugfix ASR_Context_Biasing.ipynb by @lilithgrigoryan :: PR: #13109
- Change branch for installation from main to r2.3.0 by @ankitapasad :: PR: #13266
TTS
Changelog
NLP / NMT
Changelog
- Remove old peft docs by @cuichenx :: PR: #12675
- Add code coverage for llm gpt models conversion tests by @suiyoubi :: PR: #12665
- Make BERT TransformerBlockWithPostLNSupport accept more inputs from Mcore by @suiyoubi :: PR: #12685
- remove gifs from documentation by @dimapihtar :: PR: #12732
- Rename
FastNGramLM
->NGramGPULanguageModel
by @artbataev :: PR: #12755 - fix NeMo documentation by @dimapihtar :: PR: #12754
- GPT Model/Data/Recipe Unit Test by @suiyoubi :: PR: #12757
- ci: Exclude nlp, mm, vision collections by @ko3n1g :: PR: #12816
- Add vocab size as attr to GPT and T5 Configs, use file name based logger in llm.gpt.data by @hemildesai :: PR: #12862
- Fix transformer layer api with megatron cbc89b3 by @yaoyu-33 :: PR: #12885
Text Normalization / Inverse Text Normalization
Changelog
- Rename
FastNGramLM
->NGramGPULanguageModel
by @artbataev :: PR: #12755
Export
Changelog
- GHA Conversion Test and Importer/Exporter Refactor by @suiyoubi :: PR: #12597
- Fix Llama Embedding Model Exporting keys by @suiyoubi :: PR: #12691
- build: Add trtllm by @ko3n1g :: PR: #12672
- Fix trt-llm install by @chtruong814 :: PR: #12827
- Update LLaVA's next HF exporter to load ViT checkpoint from YAML by @eagle705 :: PR: #12841
- Support huggingface export to tensorrtllm by @pthombre :: PR: #12889
- Adds a built stage for the trt-llm wheel to reduce the overall test image size by @chtruong814 :: PR: #12883
Uncategorized:
Changelog
- Update changelog-build.yml by @ko3n1g :: PR: #12584
- Update changelog for
r2.2.0
by @github-actions[bot] :: PR: #12585 - Add comments for requirements by @thomasdhc :: PR: #12603
- [automodel] FSDP2Strategy: move to device if using a single-device by @akoumpa :: PR: #12593
- build: Remove numba pin by @ko3n1g :: PR: #12604
- docs: Update installation guides by @ko3n1g :: PR: #12596
- Change Llama Scaling Factor type to Float by @suiyoubi :: PR: #12616
- ci: Test multiple python versions by @ko3n1g :: PR: #12619
- ci: Disable reformat by @ko3n1g :: PR: #12620
- Updating ModelOpt to 0.25.0 by @janekl :: PR: #12633
- [automodel] add additional hf_dataset tests by @akoumpa :: PR: #12646
- [automodel] add jit_transform tests by @akoumpa :: PR: #12645
- [automodel] init eos_token_id inside data module by @yuanzhedong :: PR: #12610
- [automodel] grad ckpt by @akoumpa :: PR: #12644
- bugfix(llm/LLaMa) - dropout_position can never be equal to extended string by @soluwalana :: PR: #12649
- Fix inference pipeline quality issue by @Victor49152 :: PR: #12639
- [automodel] switch to direct=True to propage return codes in nemorun by @akoumpa :: PR: #12651
- add Auto Conf support for bert, t5, qwen, starcoder models by @dimapihtar :: PR: #12601
- ci: Upload coverage by @ko3n1g :: PR: #12668
- ci: Re-enable changed-files action by @ko3n1g :: PR: #12683
- build: Pin sox by @ko3n1g :: PR: #12701
- add neva quantization by @linnanwang :: PR: #12698
- Clip coverage by @abhinavg4 :: PR: #12696
- GHA CI test: Remove unnecessary directive by @pablo-garay :: PR: #12714
- minor perf fixes by @malay-nagda :: PR: #12656
- Add DeepSeek V2 Lite into llm init.py by @suiyoubi :: PR: #12664
- Add Llama-Nemotron Nano and 70B models by @suiyoubi :: PR: #12712
- Save batch norm running stats in PEFT checkpoints by @cuichenx :: PR: #12666
- Fix document Readme under nemo to add more information by @yaoyu-33 :: PR: #12699
- Fix ub_overlap_ag by @cuichenx :: PR: #12721
- Toggle fast tokenizer if error occurs by @cuichenx :: PR: #12722
- Update README.md for blackwell and AutoModel by @snowmanwwg :: PR: #12612
- Raise error on import_ckpt with overwrite=False plus README for checkpoint_converters by @janekl :: PR: #12693
- [automodel] fix validation_step by @soluwalana :: PR: #12659
- [automodel] vlm tests by @akoumpa :: PR: #12716
- Auto Configurator code coverage by @dimapihtar :: PR: #12694
- [automodel] fix automodle benchmark script by @yuanzhedong :: PR: #12605
- Remove unnecessary directives by @pablo-garay :: PR: #12743
- Add recipe tests for coverage by @cuichenx :: PR: #12737
- Add Qwen2.5 in NeMo2 by @suiyoubi :: PR: #12731
- add fallback_module to safe_import_from by @akoumpa :: PR: #12726
- Update quantization scripts & relax modelopt requirement specifier by @janekl :: PR: #12709
- Import guard fasttext by @thomasdhc :: PR: #12758
- [automodel] chunked cross entropy by @akoumpa :: PR: #12752
- Add fsdp automodel test by @BoxiangW :: PR: #12718
- [automodel] if peft move only adapters to cpu by @akoumpa :: PR: #12735
- [automodel] update hf mockdataset by @akoumpa :: PR: #12643
- [automodel] remove unused cell in multinode notebook by @yuanzhedong :: PR: #12624
- Yash/llava next coverage by @yashaswikarnati :: PR: #12745
- Tidy code: remove unneeded statements/lines by @pablo-garay :: PR: #12771
- Pass tensor instead of raw number in _mock_loss_function in PTQ by @janekl :: PR: #12769
- ci: Run on nightly schedule by @ko3n1g :: PR: #12775
- Add logs for checkpoint saving start and finalization by @lepan-google :: PR: #12697
- Alit/test coverage by @JRD971000 :: PR: #12762
- Fix loss mask with packed sequence by @ashors1 :: PR: #12642
- Add pruning recipe by @kevalmorabia97 :: PR: #12602
- Update qwen2-v1 to use NeMo quick_gelu by @thomasdhc :: PR: #12787
- [doc] Fixes for audio doc warnings by @anteju :: PR: #12736
- ci: Measure multiprocessing by @ko3n1g :: PR: #12778
- ci: Fix flaky LLM tests by @ko3n1g :: PR: #12807
- Add BERT/Qwen2.5 Unit test and Refactor all GHA Conversion Tests by @suiyoubi :: PR: #12785
- Fix TransformerBlock cuda_graphs compatibility with MCore by @buptzyb :: PR: #12779
- ci: Remove
--branch
by @ko3n1g :: PR: #12809 - ci: Move scripts fully down to files by @ko3n1g :: PR: #12802
- add init.py to make this a package by @akoumpa :: PR: #12814
- Update changelog for
r2.2.1
by @github-actions[bot] :: PR: #12818 - add finetune support for Auto Configurator by @dimapihtar :: PR: #12770
- [automodel] add cpu:gloo to backend by @akoumpa :: PR: #12832
- add missing call to _apply_liger_kernel_to_instance by @akoumpa :: PR: #12806
- Prune docker images in GHA older than 8hrs by @chtruong814 :: PR: #12838
- [audio] Adding tests for predictive models by @anteju :: PR: #12823
- Update resiliency example notebook readme and add links to the brev launchable by @ShriyaRishab :: PR: #12843
- [automodel] qlora peft by @yzhang123 :: PR: #12817
- ci: Increase prune time by @ko3n1g :: PR: #12860
- Update base container in
Dockerfile.speech
by @artbataev :: PR: #12859 - Fix qwen2.5 1.5b configuration inheritance bug by @Aprilistic :: PR: #12852
- Update modelopt upperbound to 0.27 by @thomasdhc :: PR: #12788
- Non-bloc...
NVIDIA Neural Modules 2.3.0rc4
Prerelease: NVIDIA Neural Modules 2.3.0rc4 (2025-04-21)
NVIDIA Neural Modules 2.3.0rc3
Prerelease: NVIDIA Neural Modules 2.3.0rc3 (2025-04-15)
NVIDIA Neural Modules 2.3.0rc2
Prerelease: NVIDIA Neural Modules 2.3.0rc2 (2025-04-07)
NVIDIA Neural Modules 2.2.1
Highlights
- Training
- Fix MoE based models training instability.
- Fix bug in Llama exporter for Llama 3.2 1B and 3B.
- Fix bug in LoRA linear_fc1adapter when different TP is used during saving and loading the adapter checkpoint.
Detailed Changelogs:
Uncategorized:
Changelog
- Re-add reverted commits after 2.2.0 and set next version to be 2.2.1 by @chtruong814 :: PR: #12587
- Cherry pick
Fix exporter for llama models with shared embed and output layers (12545)
intor2.2.0
by @ko3n1g :: PR: #12608 - Cherry pick
Fix TP for LoRA adapter on
linear_fc1(12519)
intor2.2.0
by @ko3n1g :: PR: #12607 - Bump mcore to use 0.11.1 by @chtruong814 :: PR: #12634
NVIDIA Neural Modules 2.2.0
Highlights
- Training
- Blackwell and Grace Blackwell support
- Pipeline parallel support for distillation
- Improved NeMo Framework installation
- Export & Deploy
- vLLM export for NeMo 2.0
- Evaluations
- Integrate lm-eval-harness
- Collections
- LLM
- DAPT Example and best practices in nemo 2.0
- [NeMo 2.0] Enable Tool Learning and add a tutorial
- Support GPT Embedding Model (Llama 3.2 1B/3B)
- Qwen2.5, Phi4 (via AutoModel)
- SFT for Llama 3.3 model (via AutoModel)
- Support BERT Embedding Model with NeMo 2.0
- DeepSeek SFT & PEFT Support
- MultiModal
- Clip
- SP for NeVA
- CP for NeVA
- Intern-VIT
- LLM
- Automodel
- Preview release.
- PEFT and SFT support for LLMs available via Hugging Face’s AutoModelForCausalLM.
- Support for Hugging Face-native checkpoints (full model and adapter only).
- Support for distributed training via DDP and FSDP2.
- ASR/TTS
- Lhotse: TPS-free 2D bucket estimation and filtering
- Update model outputs to make all asr outputs to be in consistent format
- Sortformer Release Model
Detailed Changelogs:
ASR
Changelog
- removed the line which caused a problem in nfa_tutorial by @Ssofja :: PR: #11710
- TPS-free 2D bucket estimation and filtering by @pzelasko :: PR: #11738
- Update transcribe_utils.py by @stevehuang52 :: PR: #11984
- Sortformer Diarizer 4spk v1 model PR Part 4: Sortformer Documents and Notebook Tutorials by @tango4j :: PR: #11707
- fix the issue during batched inference of Sortformer diarizer by @tango4j :: PR: #12047
- changed asr models outputs to be consistent by @Ssofja :: PR: #11818
- chore: Update notebooks by @ko3n1g :: PR: #12161
- add ctc segmentation by @ko3n1g :: PR: #12312
- clean up VAD tutorial by @stevehuang52 :: PR: #12410
- copy from main by @nithinraok :: PR: #12423
- ci: Disable ASR tests for now (#12443) by @ko3n1g :: PR: #12466
- ASR_CTC_Language_Finetuning.ipynb bugfix by @lilithgrigoryan :: PR: #12538
TTS
Changelog
NLP / NMT
Changelog
- Use explicit imports from megatronllm_deployable.py by @janekl :: PR: #11705
- Bug fix minor bug in TRT-LLM deployment by @oyilmaz-nvidia :: PR: #11714
- gpt moe perf scripts by @malay-nagda :: PR: #11760
- Bump mcore by @ko3n1g :: PR: #11740
- Enable packed seqs for validation by @jiemingz :: PR: #11748
- Revert Mcore update since it caused regression by @pablo-garay :: PR: #11791
- Fix Gemma2 Attention Init Args by @suiyoubi :: PR: #11792
- Add null tokenizer by @erhoo82 :: PR: #11789
- Fix DistCP inference issue by @suiyoubi :: PR: #11801
- Add BERT Embedding Models E5 Recipe by @suiyoubi :: PR: #11787
- Add rope scaling configs for NeMo 1 by @BoxiangW :: PR: #11807
- Fix calculating num_available_samples by @huvunvidia :: PR: #11830
- fix sentencepiece tokenizer special tokens by @akoumpa :: PR: #11811
- add chat sft dataset to support agent tool calling by @chenrui17 :: PR: #11759
- Revert "Revert Mcore update since it caused regression (#11791)" by @ko3n1g :: PR: #11799
- fix checkpoint load issue by @dimapihtar :: PR: #11859
- Fix nemo 1 packed sequence TE version error by @cuichenx :: PR: #11874
- enable loading older TE checkpoints by @dimapihtar :: PR: #11930
- ci: Use single runner machines for unit tests by @ko3n1g :: PR: #11937
- llm performance scripts by @malay-nagda :: PR: #11736
- [MoE] add expert tensor parallelism support for NeMo2.0 MoE by @gdengk :: PR: #11880
- add exception when loading ckpt saved by TE < 1.13 by @dimapihtar :: PR: #11988
- remove renormalize_blend_weights flag by @dimapihtar :: PR: #11975
- Llama3.2 1B Embedding Model Support by @suiyoubi :: PR: #11909
- Weekly bump by @ko3n1g :: PR: #11896
- Debug Apex distributed optimizer to handle Transformer Engine 2.0 by @timmoon10 :: PR: #12004
- throw MegatronOptimizerModule warning only with mcore models by @akoumpa :: PR: #12085
- fix nmt dataclass issue by @dimapihtar :: PR: #12081
- Propogate dp last changes from mcore by @ryantwolf :: PR: #12012
- Add error message when downloading failed. by @yuanzhedong :: PR: #12139
- interface for asymmetric pipeline schedule by @erhoo82 :: PR: #12039
- chore: Update notebooks by @ko3n1g :: PR: #12161
- Cherrypick #12382, #12415 and #12424 by @cuichenx :: PR: #12425
- ASR_CTC_Language_Finetuning.ipynb bugfix by @lilithgrigoryan :: PR: #12538
Text Normalization / Inverse Text Normalization
Changelog
NeMo Tools
Changelog
Export
Changelog
- Bug fix minor bug in TRT-LLM deployment by @oyilmaz-nvidia :: PR: #11714
- In-framework deployment NeMo 2.0 nemo_export.py test by @janekl :: PR: #11749
- Fix starcoder2 missing bias in nemo2 config for TRTLLM by @meatybobby :: PR: #11809
- Autodetect dtype on exporting to TensorRT-LLM by @janekl :: PR: #11907
- PTQ & TRT-LLM updates related to upcoming PyTorch 25.01 bump by @janekl :: PR: #11941
- Run Flake8 for nemo.export module by @janekl :: PR: #11728
- Skip initialization in hf export by @cuichenx :: PR: #12136
- update export io call by @akoumpa :: PR: #12144
- add default kwargs for trtllm model runner by @pablo-garay :: PR: #12248
- cherry-pick: fix[export]: reshard model correctly handles extra_state when it's a tensor (#12132) by @terrykong :: PR: #12335
Bugfixes
Changelog
Uncategorized:
Changelog
- Allow using vocab size from config by @shanmugamr1992 :: PR: #11718
- Fix baseline recipes by @erhoo82 :: PR: #11725
- Update changelog for
r2.1.0
by @github-actions[bot] :: PR: #11745 - ci: Fix changelog generator by @ko3n1g :: PR: #11744
- Fix 'http_port' parameter name in DeployPyTriton usages and update .qnemo compress=True path by @janekl :: PR: #11747
- Conversion NeMo and HF checkpoint script for T5 by @huvunvidia :: PR: #11739
- Add BERT Embedding Models by @suiyoubi :: PR: #11737
- Add server ready check before starting evaluation by @athitten :: PR: #11731
- only install bitsandbytes on x86 by @akoumpa :: PR: #11781
- [Bugfix] Skip processing if extra_state loads as None by @janekl :: PR: #11778
- chore(beep boop 🤖): Bump
MCORE_TAG=4dc8977...
(2025-01-07) by @ko3n1g :: PR: #11768 - make progress printer compatible with PTL v2.5.0 by @ashors1 :: PR: #11779
- Fix Mistral Conversion Issue by @suiyoubi :: PR: #11786
- build: Fix build-arg by @ko3n1g :: PR: #11815
- Lora ckpt in HF format for NeMo AutoModel by @oyilmaz-nvidia :: PR: #11712
- 8x22b seq len by @malay-nagda :: PR: #11788
- Bugfix for output_generation_logits in tensorrtllm by @athitten :: PR: #11820
- handle mistralai/Mistral-7B-Instruct-v0.3 tokenizer correctly by @akoumpa :: PR: #11839
- remove tensorstore pin in requirements*.txt by @pstjohn :: PR: #11777
- Do not load context for model transform in llm inference by @hemildesai :: PR: #11751
- update nemo2sftpeft tutorial container verison by @HuiyingLi :: PR: #11832
- Latest News updated for Cosmos by @lbliii :: PR: #11806
- Removes tensorstore 0.1.45 pin from requirements_deploy.txt by @pstjohn :: PR: #11858
- ci: Prune dangling images by @ko3n1g :: PR: #11885
- Disable tests that download datasets from web by @akoumpa :: PR: #11878
- Add context_logits for eval accuracy calculation in case of multi token prediction tasks by @athitten :: PR: #11753
- add dataset_root to SpecterDataModule by @suiyoubi :: PR: #11837
- Support both Path and str for APIs by @maanug-nv :: PR: #11865
- Run nsys callback on GBS not on MBS by @akoumpa :: PR: #11861
- ci: Set bump-branch to weekly by @ko3n1g :: PR: #11889
- chore: Update mcore-tag-bump-bot.yml by @ko3n1g :: PR: #11891
- ci: Bump Mcore in weekly PR by @ko3n1g :: PR: #11897
- check restore_config first by @akoumpa :: PR: #11890
- LinearAdapter: propagate args to _init_adapter by @akoumpa :: PR: #11902
- NeMo 2.0 fp8 conversion by @Laplasjan107 :: PR: #11845
- nemo ux expert tensor parallel by @akoumpa :: PR: #11903
- Add CP support to Neva in NeMo2 by @yaoyu-33 :: PR: #11850
- build: Move dependencies by @ko3n1g :: PR: #11790
- Add Flux and Flux Controlnet Support to Diffusion folder by @Victor49152 :: PR: #11794
- ci: Adjust bump mcore workflow by @ko3n1g :: PR: #11918
- ci: Small fix to bump workflow by @ko3n1g :: PR: #11919
- Revert #11890 and add a test that would have caught the error by @cuichenx :: PR: #11914
- ci: Adjust input argument by @ko3n1g :: PR: #11921
- Create test_phi3.py by @mayani-nv :: PR: #11843
- Enable NeMo importer and loading dist CKPT for training by @Victor49152 :: PR: #11927
- build: Pin
triton
by @ko3n1g :: PR: #11938 - Add sharding for speechlm and vlm by @BoxiangW :: PR: #11876
- Update torch load for load from disk by @thomasdhc :: PR: #11963
- Add options to add mp_policy and parallel_fn for NeMo automodel fsdp2 by @BoxiangW :: PR: #11956
- ci: Add coverage reports by @ko3n1g :: PR: #11912
- Add batching support for evaluation by @athitten :: PR: #11934
- add use_fast option by @akoumpa :: PR: #11976
- improve error and debug messages in model connector by @cuichenx :: PR: #11979
- [checkpoint][docs] Fix typos in dist checkpointing docs by @ananthsub :: PR: #1...
NVIDIA Neural Modules 2.2.0rc3
Prerelease: NVIDIA Neural Modules 2.2.0rc3 (2025-02-25)
NVIDIA Neural Modules 2.2.0rc2
Prerelease: NVIDIA Neural Modules 2.2.0rc2 (2025-02-17)
NVIDIA Neural Modules 2.2.0rc1
Prerelease: NVIDIA Neural Modules 2.2.0rc1 (2025-02-04)