Open
Description
Checklist
- I've read the contribution guidelines.
- I've searched other issues and no duplicate issues were found.
- I'm convinced that this is not my fault but a bug.
Description
While developing the ptv3 node I realized that cuda code is not being compiled with target architectures.
This ends up with code compiled for sm_52
, and a just-in-time compilation and execution in modern gpus (potentially slower).
In a rudimentary experiment, the preprocessing from centerpoint went down a whole ms, whereas the sub sample filter from @manato did not see any reduction.
This can be checked using cuobjdump
Expected behavior
cuobjdump shows that the code has been compiled for the architectures that autoware is actually used in
Actual behavior
cuobjdump libautoware_lidar_centerpoint_cuda_lib.so
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
Fatbin ptx code:
================
arch = sm_52
code version = [8,7]
host = linux
compile_size = 64bit
compressed
ptxasOptions =
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
Fatbin ptx code:
================
arch = sm_52
code version = [8,7]
host = linux
compile_size = 64bit
compressed
ptxasOptions =
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
Fatbin ptx code:
================
arch = sm_52
code version = [8,7]
host = linux
compile_size = 64bit
compressed
ptxasOptions =
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
Fatbin ptx code:
================
arch = sm_52
code version = [8,7]
host = linux
compile_size = 64bit
compressed
ptxasOptions =
Steps to reproduce
Compile autoware using any version and check with cuobjdump
Versions
No response
Possible causes
No response
Additional context
No response
Metadata
Metadata
Assignees
Type
Projects
Status
To Triage