Skip to content

feat(autoware_ptv3): implemented an inference node for ptv3 using tensorrt #10600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

knzo25
Copy link
Contributor

@knzo25 knzo25 commented May 12, 2025

Description

This PR implements an inference node for Point Transformer V3 (PTv3).

On a 10cm grid in the range of -76m - 76m, the processing times are

  • ~1ms preprocessing
  • 25-30ms inference (although this is for a blackwell card)

Related links

Branch containing all the PRs needed for inference

Related PRs:

Parent Issue:

Required PRs before this one can be merged:

How was this PR tested?

Notes for reviewers

None.

Interface changes

None.

Effects on system behavior

None.

@knzo25 knzo25 self-assigned this May 12, 2025
@github-actions github-actions bot added type:documentation Creating or refining documentation. (auto-assigned) component:perception Advanced sensor data processing and environment understanding. (auto-assigned) labels May 12, 2025
Copy link

github-actions bot commented May 12, 2025

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

knzo25 added 4 commits May 12, 2025 14:02
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
@knzo25
Copy link
Contributor Author

knzo25 commented May 12, 2025

@amadeuszsz @scepter914
I was testing this with a model trained for nuscenes, and have the following general comments:

  • Inference time is dominated by sparse convolution operations, unique/sort operations, and multi head attention.
  • I believe the network is bigger than really needed, so the inference time could be reduced (spconv / mha) if less features or a shallower network is used
  • Unique/argsort can only be accelerated using less points, 32-bit hashes, fusing unique and argsort, or looking for a better algorithm
  • I am not really sure, but looking at the kernels executed by tensorrt, I did not see fa_mha so flash attention may not be used as of now. This could be looked further, implement some fa kernels, or using part of the TensorRT made for LLMs.
  • While expected, inference in fp16 work visually worse than fp32. Probably the network needs to be trained int fp16.
  • In my basic tests, I think for segmentation there is not much gain in using the whole 120m range or 5cm voxels (the original network was designed for 5cm, which may be the reason that training and inferring for 10cm makes it seem that the network is overdimensioned).

Can you please upload the models for t4dataset that I left? (If you can retrain for 76m - 10cm that would be awesome too 🙏 )

Copy link

codecov bot commented May 12, 2025

Codecov Report

Attention: Patch coverage is 0% with 429 lines in your changes missing coverage. Please review.

Project coverage is 15.78%. Comparing base (0b9bde4) to head (d361dea).

Files with missing lines Patch % Lines
perception/autoware_ptv3/lib/ptv3_trt.cpp 0.00% 180 Missing ⚠️
.../autoware_ptv3/lib/preprocess/preprocess_kernel.cu 0.00% 86 Missing ⚠️
perception/autoware_ptv3/src/ptv3_node.cpp 0.00% 76 Missing ⚠️
...utoware_ptv3/include/autoware/ptv3/ptv3_config.hpp 0.00% 48 Missing ⚠️
...utoware_ptv3/lib/postprocess/postprocess_kernel.cu 0.00% 33 Missing ⚠️
...tion/autoware_ptv3/include/autoware/ptv3/utils.hpp 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10600      +/-   ##
==========================================
- Coverage   15.89%   15.78%   -0.11%     
==========================================
  Files        1347     1356       +9     
  Lines      100088   100746     +658     
  Branches    32887    32981      +94     
==========================================
  Hits        15907    15907              
- Misses      71982    72640     +658     
  Partials    12199    12199              
Flag Coverage Δ *Carryforward flag
daily 17.32% <ø> (ø) Carriedforward from 0b9bde4
daily-cuda 15.95% <ø> (ø) Carriedforward from 0b9bde4
differential-cuda 0.00% <0.00%> (?)
total-cuda 15.72% <ø> (ø) Carriedforward from 0b9bde4

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@knzo25
Copy link
Contributor Author

knzo25 commented May 22, 2025

@amadeuszsz
Thank you for your help with the plugins' PRs.
This branch should be able to be tested directly now

Copy link
Contributor

@amadeuszsz amadeuszsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great PR! Initial review addressed, but could you please share ONNX file as I was not able to deploy your .pth model? I would like to check the runtime and try to crash it, check the latency etc.

@knzo25
Copy link
Contributor Author

knzo25 commented May 31, 2025

@amadeuszsz
To share the onnx I would need the pth, but essentially I ran out of time. Hopefully, I will manage to apply your review comments, but that may be as far as I can go

@amadeuszsz
Copy link
Contributor

@amadeuszsz To share the onnx I would need the pth, but essentially I ran out of time. Hopefully, I will manage to apply your review comments, but that may be as far as I can go

Sure, I understand. After checking the runtime I can go with fixes by myself, of course if you allow me to work on your PR 🙇🏻‍♂️

knzo25 added 15 commits June 1, 2025 14:48
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
Signed-off-by: Kenzo Lobos-Tsunekawa <[email protected]>
@knzo25
Copy link
Contributor Author

knzo25 commented Jun 1, 2025

@amadeuszsz
I think I addressed (or attempted do) all the comments. Sadly, CI/CD does not pass, though I do not know why.
Please, try to add any fixes if you have time 🙏

@knzo25 knzo25 requested a review from amadeuszsz June 1, 2025 08:13
Copy link
Contributor

@amadeuszsz amadeuszsz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I addressed (or attempted do) all the comments. Sadly, CI/CD does not pass, though I do not know why.
Please, try to add any fixes if you have time 🙏

@knzo25
Thanks for addressing all the comments! I will approve this PR as soon as I can get the ONNX and confirm the runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) tag:require-cuda-build-and-test type:documentation Creating or refining documentation. (auto-assigned)
Projects
Status: To Triage
Development

Successfully merging this pull request may close these issues.

2 participants