-
Notifications
You must be signed in to change notification settings - Fork 377
Issues: pytorch/torchtitan
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
How to pretrain from scratch a Qwen 2.5 7B-base model using Torchtitan?
#1223
opened May 25, 2025 by
tjoymeed
expert_bias is updated during training but saved checkpoint contains only zero values
#1222
opened May 24, 2025 by
trestad
float8 rowwise vanilla TP low throughput
bug
Something isn't working
module: float8
#1207
opened May 20, 2025 by
danielvegamyhre
[MXFP8] unable to run titan llama3 debug model with mxfp8. Assertion: n_rows % max_row_tile_size == 0
bug
Something isn't working
#1200
opened May 16, 2025 by
lessw2020
Save RNG states during checkpointing for deterministic debugging
enhancement
New feature or request
#1194
opened May 14, 2025 by
wwwjn
document the usage of environment variables
better_engineering
Repo code quality improvements
documentation
Improvements or additions to documentation
high priority
triage review
#1192
opened May 14, 2025 by
tianyu-l
PP Zero Bubble CI tests failure
ci test failure
high priority
module: pipelining
triage review
#1188
opened May 13, 2025 by
tianyu-l
issues on llama3 compile + (async) TP + AC
ci test failure
high priority
module: torch.compile
triage review
#1185
opened May 13, 2025 by
tianyu-l
Can we support outputting checkpoints directly in .pt format?
enhancement
New feature or request
module: checkpoint
#1177
opened May 9, 2025 by
andrewor14
[Question] FSDP+TP CUDA_DEVICE_MAX_CONNECTIONS
documentation
Improvements or additions to documentation
module: fsdp
question
Further information is requested
#1147
opened Apr 27, 2025 by
ChenchaoZhao
fully_shard() for huggingface model: pytorch caches too much GPU memory
module: fsdp
question
Further information is requested
#1126
opened Apr 21, 2025 by
mingdianliu
[DeepSeek MoE] current workstream planning
enhancement
New feature or request
#1125
opened Apr 21, 2025 by
lessw2020
Llama 4 issue tracking
high priority
triage review
#1118
opened Apr 17, 2025 by
tianyu-l
3 of 13 tasks
FSDP2 root level parameter management
module: fsdp
question
Further information is requested
#1091
opened Apr 11, 2025 by
dingqingy
Torch.compile and TP during multiresolution Training
module: torch.compile
question
Further information is requested
#1081
opened Apr 9, 2025 by
nighting0le01
Is the currnet configuration system over-engineered?
question
Further information is requested
#1055
opened Apr 3, 2025 by
wangkuiyi
Clarify PP split point documentation.
question
Further information is requested
#1054
opened Apr 3, 2025 by
githubsgi
Overflow in
F.scaled_dot_product_attention
when using profiling with deterministic training
#1049
opened Apr 3, 2025 by
JungHoyoun
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-04-25.