55
55
simdgroup_load(data::MtlDeviceArray{T}, matrix_origin=(1, 1))
56
56
57
57
Loads data from device or threadgroup memory into an 8x8 SIMD-group matrix
58
- and returns it. `T` must be either `Float16` or `Float32 `.
58
+ and returns it. `T` must be either `Float16`, `Float32`, or `BFloat16 `.
59
59
60
60
# Arguments
61
61
- `matrix_origin::NTuple{2, Int64}=(1, 1)`: origin in the source memory to load from.
@@ -65,7 +65,7 @@ and returns it. `T` must be either `Float16` or `Float32`.
65
65
simdgroup_store(src, dest::MtlDeviceArray{T}, matrix_origin=(1, 1))
66
66
67
67
Stores data from an 8x8 SIMD-group matrix into device or threadgroup memory.
68
- `T` must be either `Float16` or `Float32`.
68
+ `T` must be either `Float16`, `Float32`, `BFloat16 `.
69
69
70
70
# Arguments
71
71
- `matrix_origin::NTuple{2, Int64}=(1, 1)`: origin in the destination memory to store to.
@@ -119,7 +119,7 @@ The value for delta must be the same for all threads in the SIMD-group. This fun
119
119
doesn’t modify the upper delta lanes of data because it doesn’t wrap values around
120
120
the SIMD-group.
121
121
122
- T must be one of the following: Float32, Float16, Int32, UInt32, Int16, UInt16, Int8, or UInt8
122
+ T must be one of the following: Float32, Float16, BFloat16, Int32, UInt32, Int16, UInt16, Int8, or UInt8
123
123
"""
124
124
simd_shuffle_down
125
125
@@ -132,6 +132,6 @@ lane ID minus delta.
132
132
The value of delta must be the same for all threads in a SIMD-group. This function doesn’t
133
133
modify the lower delta lanes of data because it doesn’t wrap values around the SIMD-group.
134
134
135
- T must be one of the following: Float32, Float16, Int32, UInt32, Int16, UInt16, Int8, or UInt8
135
+ T must be one of the following: Float32, Float16, BFloat16, Int32, UInt32, Int16, UInt16, Int8, or UInt8
136
136
"""
137
137
simd_shuffle_up
0 commit comments