[QUESTION] how to control GPU memory layout for 70B LLM model? #1074
Unanswered
wangdaw2023
asked this question in
Q&A
Replies: 1 comment
-
Maybe try to set smaller TP larger PP (e.g. TP=4, PP=4 or TP=4, PP=8) for 70B case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am training 70B Megatron LLM on A800 with 32 nodes cluster. The cluster is composed of 32 nodes with 8 * A800 and 4 * RoCE 200Gb/s. I find 70B MFU 20% is quite lower than 32B model MFU 47%. Besides, I find some node GPU memory usage is 70GB, while other node memory usage is 50GB. I would like to tune memory usage to the same level to use bigger micro batch size to improve MFU. It involves to place/layout which LLM layer to which rank. Any document for this topic?
32B LLM, TP=8, PP=1, MFU=47%
70B LLM, TP=8, PP=2, MFU=20%
Beta Was this translation helpful? Give feedback.
All reactions