Is there any data or reference available for the baseline All-Reduce performance of TPU(v3 or later), similar to nccl-tests? #27770
Unanswered
ckjung1987
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello all,
It seems that the All-Reduce algorithm implementation on TPU is done as follows.
https://docs.jax.dev/en/latest/_autosummary/jax.lax.psum.html
Is there any performance reference or result available for different msg_sizes when the implementation is properly executed?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions