Skip to content

Unexpected Performance: Single-Threaded Faster than Multi-Threaded in Point Cloud Alignment #145

Closed
@Ea510chan

Description

@Ea510chan

Hello there! Thanks for your great work!
I had an issue when I deployed it on my pc. can anyone help me take a look? Thanks!!

Description

I have observed an unexpected performance behavior while using fast_gicp_mt. Specifically, the single-threaded versions of certain point cloud alignment algorithms such as GICP and NDT are outperforming their multi-threaded counterparts. This was observed while aligning two point clouds of sizes 17047 and 17334 points.

Environment

The repo is deployed using docker.
OS: Ubuntu20.04 + ROS Noetic
GPU: RTX4090 32GB
CPU: i9-13900KF
RAM: 32GB
I deployed the repo on WSL using Docker.

Details

The execution times for various algorithms were recorded, and it was noted that single-threaded implementations were consistently faster than multi-threaded ones. Below are some of the results obtained:

$ rosrun fast_gicp gicp_align 251370668.pcd 251371071.pcd
target:17047[pts] source:17334[pts]
--- pcl_gicp ---
single:110.186[msec] 100times:11059.9[msec] fitness_score:0.204892
--- pcl_ndt ---
single:39.1375[msec] 100times:4043.5[msec] fitness_score:0.229616
--- fgicp_st ---
single:101.371[msec] 100times:9945.61[msec] 100times_reuse:6586.6[msec] fitness_score:0.204376
--- fgicp_mt ---
single:135.229[msec] 100times:12986.9[msec] 100times_reuse:11950.3[msec] fitness_score:0.204384
--- vgicp_st ---
single:85.6506[msec] 100times:7514.18[msec] 100times_reuse:4194.52[msec] fitness_score:0.205022
--- vgicp_mt ---
single:158.688[msec] 100times:16300.5[msec] 100times_reuse:15309.5[msec] fitness_score:0.205022
--- ndt_cuda (P2D) ---
single:17.4151[msec] 100times:1702.9[msec] 100times_reuse:1340.19[msec] fitness_score:0.197208
--- ndt_cuda (D2D) ---
single:13.5261[msec] 100times:1391.88[msec] 100times_reuse:1119.26[msec] fitness_score:0.199985
--- vgicp_cuda (parallel_kdtree) ---
single:37.8372[msec] 100times:3054.31[msec] 100times_reuse:1987.94[msec] fitness_score:0.205017
--- vgicp_cuda (gpu_bruteforce) ---
single:65.4749[msec] 100times:3064.62[msec] 100times_reuse:2966.4[msec] fitness_score:0.249594
--- vgicp_cuda (gpu_rbf_kernel) ---
single:13.1453[msec] 100times:1515.33[msec] 100times_reuse:1119.99[msec] fitness_score:0.204766

Expected Behavior:

Typically, one would expect the multi-threaded implementations to be faster or at least as fast as the single-threaded ones, especially when dealing with large datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions