You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We introduce 3 awesome new features to the OCCA library. They are still in the π experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.
Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem π.
π occa::forLoop and inlined kernels
Basic Example
Here we generate a for-loop that goes through [0, N) and tiled by tileSize
We give an example where an index array is passed rather than a simple occa::range
Additionally, this @inner loop has 2 dimensions so the expected OCCA_FUNCTION should be taking in an int2 for the inner indices
We also include a helper method occa::range which implements most of the occa::array methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a generic map or reduce function.
// Iterates through [0, 1, 2]occa::range(3).map(...);
// Iterates through [3, 4, 5]occa::range(3, 6).map(...);
// Iterates through [6, 5, 4]occa::range(6, 3).map(...);
// Iterates through [0, 2, 4]occa::range(0, 6, 2).map(...);
// Iterates through [6, 4, 2]occa::range(6, 0, -2).map(...);
// No-op since there isn't anything to iterate throughocca::range(6, 0, 1).map(...);
Core methods
forEach
mapTo
map
reduce
Reduction
every
max
min
some
Re-indexing
reverse
shiftLeft
shiftRight
Utility methods
cast
clamp
clampMax
clampMin
concat
dot
fill
slice
Search
findIndex
find
includes
indexOf
lastIndexOf
π Atomics
It's still in it's π experimental stage, but OKL now allows for basic atomic operations!
βΉοΈ Β @atomic should be fully available for Serial and OpenMP modes. There is probably still room for improvement in the OpenMP implementation!
β οΈ Β GPU modes (HIP, CUDA, OpenCL) don't have general atomics implemented, only have the following basic updates:
@atomic value += update;
@atomic value -= update;
@atomic value &= update;
@atomic value |= update;
@atomic value ^= update;
Inlined @atomic
@atomic *ptr += value;
Block @atomic
If you prefer, you can use blocks which will be equivalent to inlined @atomic use if possible
@atomic {
*ptr += value;
}
However, generic @atomic blocks are also possible
@atomic {
*ptr += value;
*ptr2 += value2;
}
π DPC++ Backend
The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:
Anoop Madhusoodhanan Prabha (Intel)
Cedric Andreolli (Intel)
Kris Rowe (ALCF)
Phillipe Thierry (Intel)
Saumil Patel (ALCF)
Notes
Currently only building with CMake is supported.
Code Transformation Rewrite
The way statement and expression code transformations are done have been fully rewritten!
A functional occa::lang::array class was introduced to help with statement (statement_t) and expression (exprNode) iteration and transformation. More information on PR #404.
Additionally the occa::lang::expr class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.
β οΈ Breaking Changes
This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!
occa::properties is now deprecated and replaced with occa::json
occa::properties wasn't adding much on top of occa::json, instead making auto-casting harder since we had to handle both json and prop objects. We still keep the properties and props naming convention throughout the library, since that's what they are but have transitioned the types to occa::json.
We still have a
typedef json properties;
so there shouldn't be any type-breaking changes for C++. The big difference is how std::string is being cast to json/properties:
std::string ? occa::properties: The std::string value is parsed into its JSON value. For example, we can pass {key: 1} or key: 1
std::string ? occa::json: The occa::json value is a literal string value. For example, if we pass {key: 1} then the occa::json value will be a string whose value is "{key: 1}".
Details about the refactor:
- [C++] The only breaking change is property strings now need to have the surrounding braces ({}) to make it valid JSON
- [C] All property methods have been removed and should be replaced with the Json methods
- [Fortran] All property methods have been removed and should be replaced with the Json methods
[Removing umalloc on v1.2.0 (Feedback Wanted)Β #475 ] We're removing umalloc + UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along with occa::memory and occa::array.
Table of Contents
occa::forLoop
and inlined kernelsocca::array
and Functional Programmingπ₯ Exciting News
We introduce 3 awesome new features to the OCCA library. They are still in the π experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.
Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem π.
π occa::forLoop and inlined kernels
Basic Example
Here we generate a for-loop that goes through
[0, N)
and tiled bytileSize
We can do it manually by calling
.outer
and.inner
Indices + Multiple Dimensions
We give an example where an index array is passed rather than a simple
occa::range
Additionally, this
@inner
loop has 2 dimensions so the expectedOCCA_FUNCTION
should be taking in anint2
for the inner indicesπ occa::array and Functional Programming
We introduce a simple wrapper on
occa::memory
which is typed and contains some of the coremap
andreduce
functional methods.Example
We also include a helper method
occa::range
which implements most of theocca::array
methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a genericmap
orreduce
function.Core methods
forEach
mapTo
map
reduce
Reduction
every
max
min
some
Re-indexing
reverse
shiftLeft
shiftRight
Utility methods
cast
clamp
clampMax
clampMin
concat
dot
fill
slice
Search
findIndex
find
includes
indexOf
lastIndexOf
π Atomics
It's still in it's π experimental stage, but OKL now allows for basic atomic operations!
βΉοΈ Β
@atomic
should be fully available forSerial
andOpenMP
modes. There is probably still room for improvement in theOpenMP
implementation!HIP
,CUDA
,OpenCL
) don't have general atomics implemented, only have the following basic updates:@atomic value += update;
@atomic value -= update;
@atomic value &= update;
@atomic value |= update;
@atomic value ^= update;
Inlined
@atomic
Block
@atomic
If you prefer, you can use blocks which will be equivalent to inlined
@atomic
use if possibleHowever, generic
@atomic
blocks are also possibleπ DPC++ Backend
The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:
Notes
Currently only building with CMake is supported.
Code Transformation Rewrite
The way statement and expression code transformations are done have been fully rewritten!A functional
occa::lang::array
class was introduced to help with statement (statement_t
) and expression (exprNode
) iteration and transformation. More information on PR #404.Additionally the
occa::lang::expr
class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!
occa::properties
is now deprecated and replaced withocca::json
occa::properties
wasn't adding much on top ofocca::json
, instead making auto-casting harder since we had to handle both json and prop objects. We still keep theproperties
andprops
naming convention throughout the library, since that's what they are but have transitioned the types toocca::json
.We still have a
typedef json properties;
so there shouldn't be any type-breaking changes for C++. The big difference is how
std::string
is being cast to json/properties:std::string
?occa::properties
: Thestd::string
value is parsed into its JSON value. For example, we can pass{key: 1}
orkey: 1
std::string
?occa::json
: Theocca::json
value is a literal string value. For example, if we pass{key: 1
} then theocca::json
value will be a string whose value is"{key: 1}"
.Details about the refactor:
- [C++] The only breaking change is property strings now need to have the surrounding braces (
{}
) to make it valid JSON- [C] All property methods have been removed and should be replaced with the Json methods
- [Fortran] All property methods have been removed and should be replaced with the Json methods
umalloc
+ UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along withocca::memory
andocca::array
.β Features
host: true
option tomalloc
for better host-allocation strategies (Thanks @noelchalmers!)statementArray
andexprNodeArray
which makes it easy to:forEach
ornestedForEach
(recursive))filter
orflatFilter
(recursive))exprNode
) throughexprNodeArray::inplaceMap
occa::lang::expr
helper class to build expressions without having to know the underlyingexprNode
types or worry about pointers!okl/strict_headers
kernel property to avoid erroring on headers OCCA can't find. Useful for mode-specific system headers.sourceCodeStatement
to inject non-standard source code when needed.@atomic
support (TODO: Finish most base implementations)occa::setOccaCacheDir
to programmatically set theOCCA_CACHE_DIR
at runtimeocca::getDeviceCount
(Thanks @noelchalmers!)device.wrapMemory
to wrap native pointers intoocca::memory
objectsoccaKernelRunWithArgs
which takes anoccaType
pointerocca::memory
tracks objects to properly handle slicing (Thanks @noelchalmers π )π Bugs Fixed
0
sysctl
since it was deprecated and later removed from the C standardif
statement'scondition
(Thanks @noelchalmers!)strncmp
(Thanks @MalachiTimothyPhillips!)π Contributors
The text was updated successfully, but these errors were encountered: