Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

NVIDIA / Megatron-LM Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 12.7k

Code
Issues 318
Pull requests 202
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Releases: NVIDIA/Megatron-LM

Releases · NVIDIA/Megatron-LM

NVIDIA Megatron Core 0.12.1

23 May 09:54

ko3n1g

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.12.1 Latest

Latest

Merge branch 'gaod/llama4/te_fix' into 'core_r0.12.0'

Fix the TE assertion for release

See merge request ADLR/megatron-lm!3340

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

duttaANI, okoge-kaz, MJy1023, Ferenas, alxndrTL, tareqmahmood, and pengyao96 reacted with thumbs up emoji

All reactions

👍 7 reactions

7 people reacted

NVIDIA Megatron Core 0.12.0

06 May 21:10

ko3n1g

This tag was signed with the committer’s verified signature.

ko3n1g oliver könig

GPG key ID: 2A0D811D627CDD85

Verified

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

ko3n1g oliver könig

GPG key ID: 2A0D811D627CDD85

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.12.0

Add FP8 recipe selection to arguments (--fp8-recipe, --first-last-layers-bf16, --num-layers-at-start-in-bf16, --num-layers-at-end-in-bf16)
Context parallel: fix loss scaling when calculate_per_token_loss=True
Make the number of data parallel communication buckets configurable (--ddp-num-buckets, --ddp-pad-buckets-for-high-nccl-busbw)
Inference
- Support in-flight batching and chunked KV cache
- Reduce memory usage,
  - by not materializing full attention mask
  - by only materializing logits for the last token during decode
  - by removing an obsolete tensor reference
Hybrid Model
- Inference
  - Add CUDA graph support
  - Change tools/run_mamba_text_generation_server.py to use megatron.core.inference
  - Fix a shape issue when materializing logits for Mamba model
- Improve initialization of Mamba layers
- Add configuration switches (--mamba-state-dim, --mamba-head-dim, --mamba-num-groups, --is-hybrid-model)
- Make num_floating_point_operations work with hybrid model
- Make hybrid_conversion.py work with mixer that uses TE linear
- Add FP8 support
- Fix Mamba dt_bias tensor parallelism
- Support multimodal tokenizer
- Improve data parallelism scaling
MoE
- Features:
  - DeepEP support, compatible with all the parallelisms and token drop / dropless
  - Important precision improvement: Enable FP32/FP64 routing and unpermutation using –moe-router-dtype. FP32 is recommended for all fine-grained MoE training
  - CUDA Graph support for MoE
  - Multi-Token Prediction (MTP) Support
  - Fused indices_to_multihot kernel for DeepEP dispatcher
- Bug fixes:
  - Fix Hang Issue with MoE+Dense Hybrid models
  - Update theoretical memory and tflops estimation for MoE and MLA
  - Fix MoE Aux loss scaling for per token loss
  - Fixes for group limited routing and expert bias. We verified these fixes through dsv3 e2e verifications
- Known issues:
  - The ckpt trained with Custom FSDP for MoE may not be compatible with 3D parallel training.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

Ktakuya332C, sjcheon86, okoge-kaz, Victarry, GHGmc2, Wilson-ZHANG, lumosity4tpj, Opdoop, yaox12, 651961, and 8 more reacted with hooray emoji

All reactions

🎉 18 reactions

18 people reacted

NVIDIA Megatron Core 0.12.0rc3

15 Apr 19:50

ko3n1g

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.12.0rc3 Pre-release

Pre-release

Prerelease: NVIDIA Megatron Core 0.12.0rc3 (2025-04-15)

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

okoge-kaz reacted with eyes emoji

All reactions

👀 1 reaction

1 person reacted

NVIDIA Megatron Core 0.12.0rc2

09 Apr 10:27

ko3n1g

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.12.0rc2 Pre-release

Pre-release

Prerelease: NVIDIA Megatron Core 0.12.0rc2 (2025-04-09)

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

okoge-kaz and alxndrTL reacted with eyes emoji

All reactions

👀 2 reactions

2 people reacted

NVIDIA Megatron Core 0.11.0

14 Mar 22:59

ko3n1g

This commit was signed with the committer’s verified signature.

ko3n1g oliver könig

GPG key ID: 2A0D811D627CDD85

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.11.0

Add multi datacenter training support though N/S connection
MoE
- Features
  - Support DeepSeek-V3 fine-tuning
    - Aux-loss-free load balancing strategy
    - Node-limited routing and Device-limited routing support.
    - Tensor Parallelism support for MLA and Sequence Auxiliary Loss
    - MTP (with TP and PP support) is coming soon.
  - Permutation / Unpermutation fusion kernel from TransformerEngine.
  - Uneven virtual pipeline parallel split support in first and last PP stage.
- Bug fixes:
  - Fix the grad scale when TP != expert-TP and average_in_collective is enabled in DDP.
  - Fix TEGroupedMLP distckpt compatibility issue with FP8 padding/unpadding.
- Known Issues:
  - When training the Dense+MoE hybrid model, the process will hang if any PP rank does not have expert params.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

ErcinDedeoglu, ZhongYFeng, GHGmc2, okoge-kaz, deepvars, lumosity4tpj, nrailg, zejun-chen, jindajia, sushildubey171, and 2 more reacted with thumbs up emoji

gqjia and lawchingman reacted with laugh emoji

ErcinDedeoglu, yanring, yaox12, ZhongYFeng, okoge-kaz, Victarry, and lawchingman reacted with hooray emoji

lawchingman reacted with heart emoji

ErcinDedeoglu, Kuangdd01, kevalmorabia97, 651961, jiangguochaoGG, bzantium, ZhongYFeng, and lawchingman reacted with rocket emoji

All reactions

👍 12 reactions
😄 2 reactions
🎉 7 reactions
❤️ 1 reaction
🚀 8 reactions

22 people reacted

NVIDIA Megatron Core 0.11.0rc0

20 Feb 10:43

ko3n1g

This commit was signed with the committer’s verified signature.

ko3n1g oliver könig

GPG key ID: 2A0D811D627CDD85

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.11.0rc0 Pre-release

Pre-release

Prerelease: NVIDIA Megatron Core 0.11.0rc0 (2025-02-20)

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

TAJ2003 and Emperorizzis reacted with thumbs up emoji

TAJ2003 reacted with hooray emoji

TAJ2003 reacted with heart emoji

TAJ2003 reacted with rocket emoji

All reactions

👍 2 reactions
🎉 1 reaction
❤️ 1 reaction
🚀 1 reaction

2 people reacted

NVIDIA Megatron Core 0.10.0

17 Feb 17:31

ko3n1g

This commit was signed with the committer’s verified signature.

ko3n1g oliver könig

GPG key ID: 2A0D811D627CDD85

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.10.0

Adding MLA to MCore
Enable FP8 for GroupedMLP
MoE Parallel Folding
Enhance MoE Architecture: Support MoE Layer Frequency Patterns and Configurable MoE FFN Hidden Size
Multimodal: NVLM training and evaluation support in MCore
Mamba Hybrid
- Increase performance and reduce memory footprint of Triton language/compiler distributed caching
- Add more unit testing and fix bugs

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

okoge-kaz, h-albert-lee, 651961, shjwudp, xichengpro, mustious, vermouth1992, nrailg, wangcho2k, zyfhaha, and 2 more reacted with thumbs up emoji

651961 and lawchingman reacted with laugh emoji

okoge-kaz, 651961, kevalmorabia97, lawchingman, and 13inccc reacted with hooray emoji

All reactions

👍 12 reactions
😄 2 reactions
🎉 5 reactions

14 people reacted

NVIDIA Megatron Core 0.9.0

24 Oct 10:30

ko3n1g

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.9.0

Uneven pipeline parallelism
- Enable pipeline parallelism where first and last ranks have fewer transformer layers than the intermediate ranks
Per layer CUDAGraph support for GPT training with Transformer Engine modules
Enable different TP sizes for the vision encoder
Enable pipeline parallelism for T5 & Llava models
Support multi-tile multi-image input in Llava models
MoE
- FP8 support
- Runtime upcycling support
- Dispatcher implementation optimizations
- Shared expert support with overlapping optimizations
  - Qwen Model support
Mamba Hybrid
- Main branch is no longer compatible with released checkpoints (use ssm branch)
- Add distributed checkpointing
- Fix bugs related to inference
- Add unit tests
Known Issues
- When using sequence parallel, during the transformer block forward pass, dropout is not using the appropriate rng context.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

okoge-kaz, johnnynunez, divisionblur, shjwudp, MARD1NO, eagle705, InhabitancyCocoon, suenphey, Rememberz, 651961, and 4 more reacted with thumbs up emoji

okoge-kaz, kevalmorabia97, Victarry, hchings, GHGmc2, yaox12, yanring, MARD1NO, eagle705, Rememberz, and 4 more reacted with hooray emoji

Rememberz and lawchingman reacted with rocket emoji

Rememberz, csJoax, and lawchingman reacted with eyes emoji

All reactions

👍 14 reactions
🎉 14 reactions
🚀 2 reactions
👀 3 reactions

24 people reacted

NVIDIA Megatron Core 0.8.0

13 Aug 12:12

ko3n1g

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.8.0

Multimodal
- Added initial support for training vision language models using the LLaVA architecture
- Added initial support for inference with multimodal inputs
- End-to-end multimodal example from data collection to training to evaluation is provided in examples/multimodal
MoE
- Context Parallel support.
- Distributed checkpoint support for grouped GEMM.
Mamba
- Added initial support for training and inference of Mamba-2 models
- Support for hybrid models consisting of Mamba-2, attention, and MLP layers
- Examples provided in examples/mamba

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

lzzmm, okoge-kaz, MARD1NO, GHGmc2, Eugene29, haolin-nju, lumosity4tpj, nrailg, taozhiwei, zyfhaha, and 5 more reacted with thumbs up emoji

lzzmm and Annmarie56 reacted with laugh emoji

lzzmm, okoge-kaz, Btlmd, kimleang123, nrailg, and mgw2168-1 reacted with hooray emoji

lzzmm, patrick-g-zhang, and mgw2168-1 reacted with heart emoji

lzzmm and nrailg reacted with rocket emoji

lzzmm, 1195343015, Monekyzoon, nrailg, and csJoax reacted with eyes emoji

All reactions

👍 15 reactions
😄 2 reactions
🎉 6 reactions
❤️ 3 reactions
🚀 2 reactions
👀 5 reactions

22 people reacted

NVIDIA Megatron Core 0.7.0

05 Jun 23:12

ericharper

Compare

Choose a tag to compare

Loading

NVIDIA Megatron Core 0.7.0

MoE
- Token drop support
- Several efficiency optimizations
- Improved model parallelism
- Memory optimizations
Distributed checkpointing
- Enabled for Retro
- Asynchronous checkpoint saving
Several minor bug fixes, speed improvements, and memory optimizations

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

davidycliao, alvarorichard, drxmy, kenkenpa2126, xrsrke, GHGmc2, uniqueni, CaoWGG, zhanjiqing, aoyulong, and 8 more reacted with thumbs up emoji

2kha and aoyulong reacted with heart emoji

ericharper, Mohamad-Hussein, and aoyulong reacted with rocket emoji

All reactions

👍 18 reactions
❤️ 2 reactions
🚀 3 reactions

21 people reacted

Previous 1 2 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.