Skip to content

Enable FP16 ACL kernels that accumulate into FP32 #3332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renato-arantes
Copy link
Contributor

Description

The ACL default accumulation for the fp16 kernels is to accumulate into fp16 as well, which causes some tests to fail in PyTorch. This PR enables fp16 ACL kernels that accumulate into fp32 when appropriate and consequently creates a path to fix the failing tests.

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?
  • Have you submitted performance data that demonstrates performance improvements?

@renato-arantes renato-arantes requested a review from a team as a code owner May 27, 2025 09:31
@github-actions github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label May 27, 2025
@jondea
Copy link
Contributor

jondea commented May 27, 2025

It's worth saying that we expect a regression here right, because the existing logic uses f16 accumulation? Equally, is there some logic in the benchdnn testing that allows bigger errors for ACL for f16 matmul? If there is, then it should be reverted in this commit too.

@renato-arantes
Copy link
Contributor Author

It's worth saying that we expect a regression here right, because the existing logic uses f16 accumulation? Equally, is there some logic in the benchdnn testing that allows bigger errors for ACL for f16 matmul? If there is, then it should be reverted in this commit too.

I don't know any logic in benchdnn, including here test files, that contain any calibration to support bigger errors for ACL f16 kernels.

@jondea
Copy link
Contributor

jondea commented May 28, 2025

Thanks. So just to check, because benchdnn uses integers to test the matmul primitives, there were no roundoff errors caused by the f16 accumulation. This may be something worth looking into changing at some point (definitely not in the scope of this PR though).

@renato-arantes
Copy link
Contributor Author

Thanks. So just to check, because benchdnn uses integers to test the matmul primitives, there were no roundoff errors caused by the f16 accumulation.

That is my assumption as well.

This may be something worth looking into changing at some point (definitely not in the scope of this PR though).

Agree.

@Sqvid
Copy link
Contributor

Sqvid commented May 29, 2025

Unfortunately there is nothing in the current test-set that hits this code. Could you add some tests that do? And ideally post some benchdnn lines that activate this path.

Copy link
Contributor

@Sqvid Sqvid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for testcases to be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants