Simdized quantized operations #2904

AlexandreEichenberger · 2024-08-08T15:29:00Z

Simdized quantized operations: DynamicQuantizeLinear, QuantizedLinear, and DequantizeLinear.

Added support for reduction to a scalar (current scheme for our tensor-only quantization), fused reduction of min an max needed for dynamic quantization, and added a generic support in KrnlBuilder to generate SIMD loops.

Also added MathBuilder support for clip and round so that we don't need to rely on onnx operators to do so when lowering to Krnl.

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger · 2024-08-08T21:21:15Z

@chentong319 there is currently an error, working to fix it. It will only be a small change.

…idler Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger · 2024-08-08T21:49:45Z

@chentong319 ran independent tests the fixes works. The latest commit should have a green build.

AlexandreEichenberger · 2024-08-09T13:08:59Z

Summary of changes:

Elementwise:

RoundOp was expanded manually in elementwise, as a full loop over all operations. But I needed it as an operation performing on a scalar or a simd vector. So I pulled the implementation into MathBuilder, so that I can call it anywhere where I need to compute the Round (which is an elaborate operation rounding to even whole numbers).

Enabled simd for dequantize. The issue that prevented it was the lack of SIMD support for the MathBuilder.cast. Had to add this for quantize operations (which are now vectorized) so it now works here too.

Delayed splatting in getPartiallyFlattenedSimdCode. Since MathBuilder does the splatting when operations have a mixture of scalar and SIMD, no need to do it here anymore.

Reduction:

Migrated some list support in a separate file.

Created a new operation emitFullSIMDReductionFor that does a reduction to a single scalar (previous support only reduced to an array of reduction, not a single scalar). While at it, I also enable the fused reduction of 2 distinct reductions, as DynamicQuantizedLinear needs both the min and the max at the same time.

Changed the interface to know when ops need a division using a templated approach.

ONNXToKrnlCommon

For elementary, simple operations (such as Add/Sub...) don't have a custom emitScalarOpFDor template that use the MathBuilder, and thus they don't have the scalar/vector expansion scheme. Added it there directly.

[Dynamic] Quantize Linear

Added 2 functions to perform the Dynamic part (compute min/max to get the scale/zero point] and perform the conversion. Simply moved the methods to a new independently callable operations (as they will also be needed elsewhere in the future). Removed the onnx.xxx and replaced them by math.xxx builder as we now generate fused loops.

Krnl DialectBuilder

Generate a SIMD loop for the given kernel. See the .hpp for explanation of the scheme.

MLIR DialectBuilder

Added handling of scalar/vector for math.select, and cast (that one is tricky, I left explanation in the code, code are mostly making sure to systematically use the proper type, elementType or original possibly vector type).

I added a new computeSuitableUnrollFactor to guide simd. It basically look at if simd is possible from the data type, then look at the average usage of SIMD operations, decide of an additional unroll factor given the register pressure (low pressure, more unrolling; high pressure because lots of operation, less unrolling).

Code was added for math.round and math.clip

chentong319 · 2024-08-13T14:13:40Z

Do the terms, vector and simd, have different meaning in the code?
What's the constraint for the vector length (VL)? I am not clear about the relationship of vector dialect and the final simd code for particular machine.
For reduction, I wonder whether the loop fusion in later pass can save us the trouble of handling multiple reductions.
For divide by mean, we could represent that semantics with tensor dialect and make our code cleaner. But not easy in our onnx to krnl framework.
Overall, we are trying to generate the best code in performance for common patterns with complicated code.

AlexandreEichenberger · 2024-08-13T15:08:22Z

All very good questions

Do the terms, vector and simd, have different meaning in the code?

I use them interchangeably. If you feel strongly about the one or the other, I can do a cleanup in a subsequent PR. Technically vectorization does not require the use of SIMD. For example ESSL has a vector mode where instead of calling one "math" function at a time, it calls a long vector of them of arbitrary length, and use a mixture of SIMD and scalar operations to execute them as fast as possible. SIMD implies the use of SIMD instructions.

What's the constraint for the vector length (VL)? I am not clear about the relationship of vector dialect and the final simd code for particular machine.

There is 2 components in VL. One is the hardware constraints of the machine. For z: 4 floats, 8 dlfloat16,... LLVM backend efficiently supports arbitrary vector lengths that are multiples of the hardware constraints. Essentially, if we create a 8-wide float vector, then it generates 2 SIMD instructions for each. That is a very good way to exploit ILP. I call this second factor "unroll" factor as it effectively unroll further the loop. When presented to the loops (for blocking), the VL is the product of the hardware constraints and the "unroll" factor

In practice, I look also at the register pressure: if there is a kernel with very few SIMD operations, then I want a larger unroll and if there are lots of SIMD operations, then I want a smaller unroll factor as otherwise we may blow the number of registers. Pressure is approximated by number of SIMD operations. Ideally I would use a better metrics, but it works well in practice so far.

For reduction, I wonder whether the loop fusion in later pass can save us the trouble of handling multiple reductions.

Ideally yes, I would love for you to integrate multiple reductions into a single kernel. Maybe with this new infrastructure for SIMD, it will be easier to do.

Note that this reduction pattern is to reduce a whole loop to a single scalar (not an array of reductions). That is a pattern that is currently not supported for any of the unary/binary elementwise reduction (1) because this pattern does not exist except for this very special quantization of whole vector, and (2) because I would have to significantly rewrite all of the reductions to handle this quite specific pattern.

For divide by mean, we could represent that semantics with tensor dialect and make our code cleaner. But not easy in our onnx to krnl framework.

Before, the divide by means was a flag on the pattern. That does not work for supporting multiple reductions, so I moved it to a template that can be individually turned on for each specific op. I am interested to learn more about the tensor representation, my goal here was to introduce as few changes as possible.

Overall, we are trying to generate the best code in performance for common patterns with complicated code.

Agreed. I am trying to simplify a bit the generated code, but it is not easy. On x86/arm, there are efficient horizontal/accross reduction instructions, for example. Z supports some of them for integers but not float. Thus I need a custom scheme to handle VL reductions at once so that I may efficiently do a VL-by-VL permute pattern to fully use the SIMD operations (which still requires 4 additional permute operations that are not there for machines with horizontal reductions).

I am looking into the possibility of doing a krnl - simd - reduce but its a bit involved, so I needed to first generate the easier code manually and then look into abstracting it into a support function.

chentong319

LGTM!

AlexandreEichenberger · 2024-08-13T17:50:28Z

Thanks, will implement your suggestions in the next PR, namely:

distinguish more precisely between Vector Length (VL) for hardware reason vs the additional unrolling for performance
give an example of how to use the new simd krnl interface.

jenkins-droid · 2024-08-13T19:52:48Z

Jenkins Linux s390x Build #15330 [push] Simdized quantized opera... started at 15:52

jenkins-droid · 2024-08-13T19:52:50Z

Jenkins Linux ppc64le Build #14355 [push] Simdized quantized opera... started at 15:53

jenkins-droid · 2024-08-13T19:52:51Z

Jenkins Linux amd64 Build #15325 [push] Simdized quantized opera... started at 14:52

jenkins-droid · 2024-08-13T21:05:59Z

Jenkins Linux amd64 Build #15325 [push] Simdized quantized opera... passed after 1 hr 13 min

jenkins-droid · 2024-08-13T21:39:00Z

Jenkins Linux s390x Build #15330 [push] Simdized quantized opera... passed after 1 hr 46 min

jenkins-droid · 2024-08-13T21:58:15Z

Jenkins Linux ppc64le Build #14355 [push] Simdized quantized opera... passed after 2 hr 5 min

AlexandreEichenberger added 30 commits July 30, 2024 11:00

initial changes

09ab608

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

c2f1bcd

fused simd only written

ccdc1de

Signed-off-by: Alexandre Eichenberger <[email protected]>

test

798b18a

Signed-off-by: Alexandre Eichenberger <[email protected]>

progress

a443310

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

e0cc8ce

add option to debug small optimization

a3a3bb8

Signed-off-by: Alexandre Eichenberger <[email protected]>

Merge branch 'compiler-test-option' into quant-opt-v1

5f3c302

update

f7cffc4

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

8b6f8a9

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

54b4858

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

dc04396

update

fa2db05

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

8b89360

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

477aa66

Signed-off-by: Alexandre Eichenberger <[email protected]>

upgrading infra

8fca07e

Signed-off-by: Alexandre Eichenberger <[email protected]>

updates

07ebfbd

Signed-off-by: Alexandre Eichenberger <[email protected]>

added simd broadcast (splat)

14ab808

Signed-off-by: Alexandre Eichenberger <[email protected]>

update after merging simd support

9588971

Signed-off-by: Alexandre Eichenberger <[email protected]>

updates

4cdb274

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

da42190

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

9e2ed8e

update

3d07d0a

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

3cb6894

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

7f44254

version with fixed cast for vectors

62bc618

Signed-off-by: Alexandre Eichenberger <[email protected]>

response to comments and cleanup

5d7e784

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

79845b5

updates

21f25e6

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

e1a83c5

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger added 11 commits August 6, 2024 15:31

merge

d4936fc

Signed-off-by: Alexandre Eichenberger <[email protected]>

reuse of qlin in dyn qlin

ec04863

Signed-off-by: Alexandre Eichenberger <[email protected]>

added hearder file

0f3da0b

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

6c525db

Signed-off-by: Alexandre Eichenberger <[email protected]>

working version optionally on

b3f0672

Signed-off-by: Alexandre Eichenberger <[email protected]>

fix

eaed246

Signed-off-by: Alexandre Eichenberger <[email protected]>

update

cf7e2b1

added and fixed lit tests

b2123c9

Signed-off-by: Alexandre Eichenberger <[email protected]>

format

cfb6b6f

Signed-off-by: Alexandre Eichenberger <[email protected]>

cleanup and adding simd reports

56fbe56

Signed-off-by: Alexandre Eichenberger <[email protected]>

format

260a13d

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger requested a review from chentong319 August 8, 2024 16:04

AlexandreEichenberger added 2 commits August 8, 2024 14:22

cleanup

89788ac

Signed-off-by: Alexandre Eichenberger <[email protected]>

use delayed splat in elementwise

024c5e8

Signed-off-by: Alexandre Eichenberger <[email protected]>

fix for issues with unsplatted ops that were not generated via MathBu…

e36d665

…idler Signed-off-by: Alexandre Eichenberger <[email protected]>

update

c4e0e0b

chentong319 approved these changes Aug 13, 2024

View reviewed changes

AlexandreEichenberger merged commit 2164245 into onnx:main Aug 13, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simdized quantized operations #2904

Simdized quantized operations #2904

Uh oh!

AlexandreEichenberger commented Aug 8, 2024 •

edited

Loading

Uh oh!

AlexandreEichenberger commented Aug 8, 2024

Uh oh!

AlexandreEichenberger commented Aug 8, 2024

Uh oh!

AlexandreEichenberger commented Aug 9, 2024 •

edited

Loading

Uh oh!

chentong319 commented Aug 13, 2024

Uh oh!

AlexandreEichenberger commented Aug 13, 2024 •

edited

Loading

Uh oh!

chentong319 left a comment

Uh oh!

AlexandreEichenberger commented Aug 13, 2024

Uh oh!

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

Uh oh!

Simdized quantized operations #2904

Simdized quantized operations #2904

Uh oh!

Conversation

AlexandreEichenberger commented Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexandreEichenberger commented Aug 8, 2024

Uh oh!

AlexandreEichenberger commented Aug 8, 2024

Uh oh!

AlexandreEichenberger commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chentong319 commented Aug 13, 2024

Uh oh!

AlexandreEichenberger commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chentong319 left a comment

Choose a reason for hiding this comment

Uh oh!

AlexandreEichenberger commented Aug 13, 2024

Uh oh!

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

jenkins-droid commented Aug 13, 2024

Uh oh!

Uh oh!

AlexandreEichenberger commented Aug 8, 2024 •

edited

Loading

AlexandreEichenberger commented Aug 9, 2024 •

edited

Loading

AlexandreEichenberger commented Aug 13, 2024 •

edited

Loading