Skip to content

Pull requests: aws-samples/awsome-distributed-training

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

feat: Add Hyperpod Optimum-neuron LoRA example
#631 opened Apr 4, 2025 by Captainia Loading…
Improvements/fsdp restructure enhancement New feature or request refactoring
#630 opened Apr 3, 2025 by mhuguesaws Draft
Change prometheus version for SMHP
#628 opened Apr 3, 2025 by mhuguesaws Loading…
Fix cloudwatch access from Grafana
#627 opened Apr 2, 2025 by mhuguesaws Loading…
Openzfs smhp enhancement New feature or request
#622 opened Mar 31, 2025 by amanshanbhag Loading…
Add automated Grafana dashboard deployment enhancement New feature or request
#607 opened Mar 25, 2025 by mhuguesaws Loading…
nvshmem
#599 opened Mar 24, 2025 by pbelevich Draft
Terraform Modules for HyperPod EKS enhancement New feature or request
#586 opened Mar 14, 2025 by bluecrayon52 Loading…
Update megatron lm testcase enhancement New feature or request
#537 opened Jan 30, 2025 by KeitaW Loading…
add tips to force NCCL comm to go through EFA
#531 opened Jan 23, 2025 by KeitaW Loading…
ec2 get metadata replacement
#515 opened Dec 10, 2024 by gmgtamz Loading…
easy smhp slurm and eks
#514 opened Dec 10, 2024 by gmgtamz Loading…
Update pcluster architecture guidance enhancement New feature or request
#464 opened Oct 23, 2024 by KeitaW Draft
add GPU accounting for SMHP
#462 opened Oct 21, 2024 by KeitaW Loading…
Update bionemo test case + propose to subdirectories per orchastrator documentation Improvements or additions to documentation
#396 opened Aug 5, 2024 by KeitaW Draft
Esm2 on Sagemaker Hyperpod
#387 opened Jul 25, 2024 by awsankur Loading…
update dependencies of PyTorch base image
#375 opened Jul 15, 2024 by KeitaW Loading…
Neuron distributed
#359 opened Jun 13, 2024 by KeitaW Loading…
End-to-End LLM Model Development with Torchtitan and Torchtune enhancement New feature or request
#341 opened May 20, 2024 by KeitaW Loading…
Llama training with FP8
#331 opened May 15, 2024 by pbelevich Draft
Add draft gpu troubles
#290 opened Apr 30, 2024 by mhuguesaws Draft
[WIP] torchtune usecase
#260 opened Apr 12, 2024 by pbelevich Draft
ProTip! What’s not been updated in a month: updated:<2025-03-07.