-
Notifications
You must be signed in to change notification settings - Fork 2.9k
NVIDIA Megatron-LM Discussions
Sort by:
Latest activity
Categories, most helpful, and community links
Categories
Community links
Discussions
-
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Question about resume with distributed optimizer
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏 [QUESTION] Why not use tensor parallel APIs of pytorch
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 Question with forward_backward_pipelining_without_interleaving in Megatron-LM Pipeline
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] how to profile bubble time in pipeline parallelism?
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 [QUESTION] How does tensor_parallel coop with q/k_layernorm
staleNo activity in 60 days on issue or PR -
You must be logged in to vote 🙏 -
You must be logged in to vote 🙏