-
Notifications
You must be signed in to change notification settings - Fork 1k
Enable FP16 ACL kernels that accumulate into FP32 #3332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It's worth saying that we expect a regression here right, because the existing logic uses f16 accumulation? Equally, is there some logic in the benchdnn testing that allows bigger errors for ACL for f16 matmul? If there is, then it should be reverted in this commit too. |
08b5057
to
d008fd2
Compare
I don't know any logic in benchdnn, including here test files, that contain any calibration to support bigger errors for ACL f16 kernels. |
Thanks. So just to check, because benchdnn uses integers to test the matmul primitives, there were no roundoff errors caused by the f16 accumulation. This may be something worth looking into changing at some point (definitely not in the scope of this PR though). |
That is my assumption as well.
Agree. |
Unfortunately there is nothing in the current test-set that hits this code. Could you add some tests that do? And ideally post some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for testcases to be added.
Description
The ACL default accumulation for the
fp16
kernels is to accumulate intofp16
as well, which causes some tests to fail in PyTorch. This PR enablesfp16
ACL kernels that accumulate intofp32
when appropriate and consequently creates a path to fix the failing tests.General
make test
andmake test_benchdnn_*
) pass locally for each commit?