-
-
Notifications
You must be signed in to change notification settings - Fork 878
WIP - Speed improvements to resize convolution (no vpermps w/ FMA) #2793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
cd1b77a
36fefc6
4728b97
8c19a97
7840665
58f6afb
0594035
e60dd07
72813ee
6e84a34
639ce69
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -110,6 +110,39 @@ public static Vector512<int> ConvertToInt32RoundToEven(Vector512<float> vector) | |
return Vector512.ConvertToInt32(val_2p23_f32 | sign); | ||
} | ||
|
||
/// <summary> | ||
/// Performs a multiply-add operation on three vectors, where each element of the resulting vector is the | ||
/// product of corresponding elements in <paramref name="a"/> and <paramref name="b"/> added to the | ||
/// corresponding element in <paramref name="c"/>. | ||
/// If the CPU supports FMA (Fused Multiply-Add) instructions, the operation is performed as a single | ||
/// fused operation for better performance and precision. | ||
/// </summary> | ||
/// <param name="a">The first vector of single-precision floating-point numbers to be multiplied.</param> | ||
/// <param name="b">The second vector of single-precision floating-point numbers to be multiplied.</param> | ||
/// <param name="c">The vector of single-precision floating-point numbers to be added to the product of | ||
/// <paramref name="a"/> and <paramref name="b"/>.</param> | ||
/// <returns> | ||
/// A <see cref="Vector512{Single}"/> where each element is the result of multiplying the corresponding elements | ||
/// of <paramref name="a"/> and <paramref name="b"/>, and then adding the corresponding element from <paramref name="c"/>. | ||
/// </returns> | ||
/// <remarks> | ||
/// If the FMA (Fused Multiply-Add) instruction set is supported by the CPU, the operation is performed using | ||
/// <see cref="Fma.MultiplyAdd(Vector256{float}, Vector256{float}, Vector256{float})"/> against the upper and lower | ||
/// buts. This approach can result in slightly different results compared to performing the multiplication and | ||
/// addition separately due to differences in how floating-point rounding is handled. | ||
/// <para> | ||
/// If FMA is not supported, the operation is performed as a separate multiplication and addition. This might lead | ||
/// to a minor difference in precision compared to the fused operation, particularly in cases where numerical accuracy | ||
/// is critical. | ||
/// </para> | ||
/// </remarks> | ||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static Vector512<float> MultiplyAddEstimate(Vector512<float> a, Vector512<float> b, Vector512<float> c) | ||
|
||
// Don't actually use FMA as it requires many more instruction to extract the | ||
// upper and lower parts of the vector and then recombine them. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's this about? When inlined, a helper wrapping
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Urgh... I didn't even think of |
||
=> (a + b) * c; | ||
|
||
[DoesNotReturn] | ||
private static void ThrowUnreachableException() => throw new UnreachableException(); | ||
} |
Uh oh!
There was an error while loading. Please reload this page.