Replies: 1 comment
-
Marking as stale. No activity in 60 days. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Your question

Is there any explanation about this? Seems the difference between LinearWithFrozenWeight and LinearWithGradAccumulationAndAsyncCommunication is that LinearWithFrozenWeight does not calculate the gradient of weight. This looks like a performance optimization, but turns out this could cause different results.
Beta Was this translation helpful? Give feedback.
All reactions