Enable FP16 ACL kernels that accumulate into FP32 #3332

renato-arantes · 2025-05-27T09:31:52Z

Description

The ACL default accumulation for the fp16 kernels is to accumulate into fp16 as well, which causes some tests to fail in PyTorch. This PR enables fp16 ACL kernels that accumulate into fp32 when appropriate and consequently creates a path to fix the failing tests.

General

Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
Have you formatted the code using clang-format?
Have you submitted performance data that demonstrates performance improvements?

src/cpu/aarch64/matmul/acl_matmul.cpp

jondea · 2025-05-27T10:58:33Z

It's worth saying that we expect a regression here right, because the existing logic uses f16 accumulation? Equally, is there some logic in the benchdnn testing that allows bigger errors for ACL for f16 matmul? If there is, then it should be reverted in this commit too.

renato-arantes · 2025-05-28T09:08:52Z

It's worth saying that we expect a regression here right, because the existing logic uses f16 accumulation? Equally, is there some logic in the benchdnn testing that allows bigger errors for ACL for f16 matmul? If there is, then it should be reverted in this commit too.

I don't know any logic in benchdnn, including here test files, that contain any calibration to support bigger errors for ACL f16 kernels.

jondea · 2025-05-28T10:25:27Z

Thanks. So just to check, because benchdnn uses integers to test the matmul primitives, there were no roundoff errors caused by the f16 accumulation. This may be something worth looking into changing at some point (definitely not in the scope of this PR though).

renato-arantes · 2025-05-28T12:45:06Z

Thanks. So just to check, because benchdnn uses integers to test the matmul primitives, there were no roundoff errors caused by the f16 accumulation.

That is my assumption as well.

This may be something worth looking into changing at some point (definitely not in the scope of this PR though).

Agree.

Sqvid · 2025-05-29T10:07:04Z

Unfortunately there is nothing in the current test-set that hits this code. Could you add some tests that do? And ideally post some benchdnn lines that activate this path.

Sqvid

Waiting for testcases to be added.

renato-arantes requested a review from a team as a code owner May 27, 2025 09:31

github-actions bot added the platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 label May 27, 2025

jondea reviewed May 27, 2025

View reviewed changes

src/cpu/aarch64/matmul/acl_matmul.cpp Outdated Show resolved Hide resolved

cpu: aarch64: Enable FP16 ACL kernels that accumulate into FP32

d008fd2

renato-arantes force-pushed the f16_acc_f32 branch from 08b5057 to d008fd2 Compare May 27, 2025 13:54

renato-arantes closed this May 28, 2025

renato-arantes reopened this May 28, 2025

Sqvid requested changes May 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable FP16 ACL kernels that accumulate into FP32 #3332

Enable FP16 ACL kernels that accumulate into FP32 #3332

Uh oh!

renato-arantes commented May 27, 2025

Uh oh!

Uh oh!

jondea commented May 27, 2025

Uh oh!

renato-arantes commented May 28, 2025

Uh oh!

jondea commented May 28, 2025

Uh oh!

renato-arantes commented May 28, 2025

Uh oh!

Sqvid commented May 29, 2025

Uh oh!

Sqvid left a comment

Uh oh!

Uh oh!

Enable FP16 ACL kernels that accumulate into FP32 #3332

Are you sure you want to change the base?

Enable FP16 ACL kernels that accumulate into FP32 #3332

Uh oh!

Conversation

renato-arantes commented May 27, 2025

Description

General

Uh oh!

Uh oh!

jondea commented May 27, 2025

Uh oh!

renato-arantes commented May 28, 2025

Uh oh!

jondea commented May 28, 2025

Uh oh!

renato-arantes commented May 28, 2025

Uh oh!

Sqvid commented May 29, 2025

Uh oh!

Sqvid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!