feat(autoware_ptv3): implemented an inference node for ptv3 using tensorrt #10600

knzo25 · 2025-05-12T02:49:52Z

Description

This PR implements an inference node for Point Transformer V3 (PTv3).

On a 10cm grid in the range of -76m - 76m, the processing times are

~1ms preprocessing
25-30ms inference (although this is for a blackwell card)

How was this PR tested?

Notes for reviewers

None.

Interface changes

None.

Effects on system behavior

None.

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

github-actions · 2025-05-12T02:50:05Z

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

You've checked our contribution guidelines.
Your PR follows our pull request guidelines.
All required CI checks pass before marking the PR ready for review.

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 · 2025-05-12T05:22:46Z

@amadeuszsz @scepter914
I was testing this with a model trained for nuscenes, and have the following general comments:

Inference time is dominated by sparse convolution operations, unique/sort operations, and multi head attention.
I believe the network is bigger than really needed, so the inference time could be reduced (spconv / mha) if less features or a shallower network is used
Unique/argsort can only be accelerated using less points, 32-bit hashes, fusing unique and argsort, or looking for a better algorithm
I am not really sure, but looking at the kernels executed by tensorrt, I did not see fa_mha so flash attention may not be used as of now. This could be looked further, implement some fa kernels, or using part of the TensorRT made for LLMs.
While expected, inference in fp16 work visually worse than fp32. Probably the network needs to be trained int fp16.
In my basic tests, I think for segmentation there is not much gain in using the whole 120m range or 5cm voxels (the original network was designed for 5cm, which may be the reason that training and inferring for 10cm makes it seem that the network is overdimensioned).

Can you please upload the models for t4dataset that I left? (If you can retrain for 76m - 10cm that would be awesome too 🙏 )

codecov · 2025-05-12T05:28:52Z

Codecov Report

Attention: Patch coverage is 0% with 429 lines in your changes missing coverage. Please review.

Project coverage is 15.78%. Comparing base (0b9bde4) to head (d361dea).

Files with missing lines	Patch %	Lines
perception/autoware_ptv3/lib/ptv3_trt.cpp	0.00%	180 Missing ⚠️
.../autoware_ptv3/lib/preprocess/preprocess_kernel.cu	0.00%	86 Missing ⚠️
perception/autoware_ptv3/src/ptv3_node.cpp	0.00%	76 Missing ⚠️
...utoware_ptv3/include/autoware/ptv3/ptv3_config.hpp	0.00%	48 Missing ⚠️
...utoware_ptv3/lib/postprocess/postprocess_kernel.cu	0.00%	33 Missing ⚠️
...tion/autoware_ptv3/include/autoware/ptv3/utils.hpp	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #10600      +/-   ##
==========================================
- Coverage   15.89%   15.78%   -0.11%     
==========================================
  Files        1347     1356       +9     
  Lines      100088   100746     +658     
  Branches    32887    32981      +94     
==========================================
  Hits        15907    15907              
- Misses      71982    72640     +658     
  Partials    12199    12199

Flag	Coverage Δ	*Carryforward flag
daily	`17.32% <ø> (ø)`	Carriedforward from 0b9bde4
daily-cuda	`15.95% <ø> (ø)`	Carriedforward from 0b9bde4
differential-cuda	`0.00% <0.00%> (?)`
total-cuda	`15.72% <ø> (ø)`	Carriedforward from 0b9bde4

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 · 2025-05-22T02:50:20Z

@amadeuszsz
Thank you for your help with the plugins' PRs.
This branch should be able to be tested directly now

amadeuszsz

Thank you for this great PR! Initial review addressed, but could you please share ONNX file as I was not able to deploy your .pth model? I would like to check the runtime and try to crash it, check the latency etc.

perception/autoware_ptv3/schema/ml_package_ptv3.schema.json

perception/autoware_ptv3/schema/ptv3.schema.json

perception/autoware_ptv3/lib/preprocess/preprocess_kernel.cu

perception/autoware_ptv3/src/ptv3_node.cpp

knzo25 · 2025-05-31T02:38:01Z

@amadeuszsz
To share the onnx I would need the pth, but essentially I ran out of time. Hopefully, I will manage to apply your review comments, but that may be as far as I can go

amadeuszsz · 2025-05-31T06:02:03Z

@amadeuszsz To share the onnx I would need the pth, but essentially I ran out of time. Hopefully, I will manage to apply your review comments, but that may be as far as I can go

Sure, I understand. After checking the runtime I can go with fixes by myself, of course if you allow me to work on your PR 🙇🏻‍♂️

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 · 2025-06-01T07:18:58Z

@amadeuszsz
I think I addressed (or attempted do) all the comments. Sadly, CI/CD does not pass, though I do not know why.
Please, try to add any fixes if you have time 🙏

amadeuszsz

I think I addressed (or attempted do) all the comments. Sadly, CI/CD does not pass, though I do not know why.
Please, try to add any fixes if you have time 🙏

@knzo25
Thanks for addressing all the comments! I will approve this PR as soon as I can get the ONNX and confirm the runtime.

feat: implemented an inference node for ptv3 using tensorrt

8d2bd8f

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 requested review from manato, scepter914 and amadeuszsz May 12, 2025 02:49

knzo25 self-assigned this May 12, 2025

github-project-automation bot added this to Software Working Group May 12, 2025

github-project-automation bot moved this to To Triage in Software Working Group May 12, 2025

github-actions bot added type:documentation Creating or refining documentation. (auto-assigned) component:perception Advanced sensor data processing and environment understanding. (auto-assigned) labels May 12, 2025

knzo25 added run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) tag:require-cuda-build-and-test labels May 12, 2025

knzo25 added 4 commits May 12, 2025 14:02

chore: cspells

9d2be94

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: schemas

1bd8966

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: lint (line was too long)

de4d5de

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: more schemas

d6ab63a

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 mentioned this pull request Apr 18, 2025

LiDAR semantic segmentation for Autoware #10481

Open

15 tasks

knzo25 added 2 commits May 15, 2025 14:57

fix: mistook the compute capabilities of edge devices

8751daa

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

Merge branch 'main' into feat/ptv3_node

ff24c64

amadeuszsz requested changes May 30, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into feat/ptv3_node

978b341

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 added 15 commits June 1, 2025 14:48

chore: replaced incorrect bevfusion -> ptv3

2df89ef

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: forgot to remove unused schema

693cfdf

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: duplicated variable

4c6ab17

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: changed package dep name

05a281f

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: fixed schema comment

2f7eb5d

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: removed unused headers in the post process kernels

3c228ad

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: replaced in favor of auto

93be160

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: removed unused headers

49c1102

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: changed initialization order

6688cca

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: replaced 0 by nullptr

e692ebc

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: replaced type in favor of auto

d930616

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: removed redundant message

d87e7bc

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: fixed compilation due to review changes

7aa446b

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

fix: replaced int64 by uint64

07647bc

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

chore: added more descriptive comment in the schema

37974c7

Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>

knzo25 requested a review from amadeuszsz June 1, 2025 08:13

Merge branch 'main' into feat/ptv3_node

d361dea

amadeuszsz reviewed Jun 3, 2025

View reviewed changes

knzo25 mentioned this pull request Jun 15, 2025

feat(ptv3): add a lidar segmentation model with onnx support tier4/AWML#45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(autoware_ptv3): implemented an inference node for ptv3 using tensorrt #10600

feat(autoware_ptv3): implemented an inference node for ptv3 using tensorrt #10600

Uh oh!

knzo25 commented May 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 12, 2025 •

edited

Loading

Uh oh!

knzo25 commented May 12, 2025

Uh oh!

codecov bot commented May 12, 2025 •

edited

Loading

Uh oh!

knzo25 commented May 22, 2025

Uh oh!

amadeuszsz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knzo25 commented May 31, 2025

Uh oh!

amadeuszsz commented May 31, 2025

Uh oh!

knzo25 commented Jun 1, 2025

Uh oh!

amadeuszsz left a comment

Uh oh!

Uh oh!

feat(autoware_ptv3): implemented an inference node for ptv3 using tensorrt #10600

Are you sure you want to change the base?

feat(autoware_ptv3): implemented an inference node for ptv3 using tensorrt #10600

Uh oh!

Conversation

knzo25 commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related links

How was this PR tested?

Notes for reviewers

Interface changes

Effects on system behavior

Uh oh!

github-actions bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knzo25 commented May 12, 2025

Uh oh!

codecov bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

knzo25 commented May 22, 2025

Uh oh!

amadeuszsz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

knzo25 commented May 31, 2025

Uh oh!

amadeuszsz commented May 31, 2025

Uh oh!

knzo25 commented Jun 1, 2025

Uh oh!

amadeuszsz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

knzo25 commented May 12, 2025 •

edited

Loading

github-actions bot commented May 12, 2025 •

edited

Loading

codecov bot commented May 12, 2025 •

edited

Loading