Skip to content

Xiangxu-0103/LiMoE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English | 简体中文

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Xiang Xu*,1    Lingdong Kong*,2,3    Hui Shuai4    Liang Pan3    Ziwei Liu5    Qingshan Liu4

1NUAA    2NUS    3Shanghai AI Lab    4NJUPT    5S-Lab, NTU

       

About

LiMoE is a framework that integrates the Mixture of Experts (MoE) paradigm into LiDAR data representation learning to synergistically combine multiple representations, such as range images, sparse voxels, and raw points. The framework consists of three stages: i) Image-to-LiDAR pretraining, which transfers prior knowledge from images to point clouds across different representations; ii) Contrastive Mixture Learning (CML), which uses MoE to adaptively activate relevant attributes from each representation and distills these mixed features into a unified 3D network; iii) Semantic Mixture Supervision (SMS), which combines semantic logits from multiple representations to boost downstream segmentation performance.

📝 Updates

  • [2025.02] - Our paper LiMoE has been accepted to CVPR 2025! 🎉
  • [2025.01] - Introducing the 👨‍👦‍👦 LiMoE project! For more details, kindly refer to our Project Page and Preprint. 🚀

Table of Content

⚙️ Installation

For details related to installation and environment setups, kindly refer to INSTALL.md.

♨️ Data Preparation

Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.

🚀 Getting Started

To learn more usage about this codebase, kindly refer to GET_STARTED.md.

📊 Main Results

Comparisons of State-of-the-Art Pretraining Methods

Method Distill nuScenes KITTI Waymo
LP 1% 5% 10% 25% Full 1% 1%
Random - 8.10 30.30 47.84 56.15 65.48 74.66 39.50 39.41
SLiDR ViT-S 44.70 41.16 53.65 61.47 66.71 74.20 44.67 47.57
+LiMoE ViT-S 45.80 46.82 57.54 63.85 68.61 75.64 46.81 48.81
Seal ViT-S 45.16 44.27 55.13 62.46 67.64 75.58 46.51 48.67
SuperFlow ViT-S 46.44 47.81 59.44 64.47 69.20 76.54 47.97 49.94
+LiMoE ViT-S 48.20 49.60 60.54 65.65 71.39 77.27 49.53 51.42
SLiDR ViT-B 45.35 41.64 55.83 62.68 67.61 74.98 45.50 48.32
+LiMoE ViT-B 46.56 46.89 58.09 63.87 69.02 75.87 47.96 49.50
Seal ViT-B 46.59 45.98 57.15 62.79 68.18 75.41 47.24 48.91
SuperFlow ViT-S 47.66 48.09 59.66 64.52 69.79 76.57 48.40 50.20
+LiMoE ViT-B 49.07 50.23 61.51 66.17 71.56 77.81 50.30 51.77
SLiDR ViT-L 45.70 42.77 57.45 63.20 68.13 75.51 47.01 48.60
+LiMoE ViT-L 47.43 46.92 58.41 64.54 69.69 76.32 48.25 50.23
Seal ViT-L 46.81 46.27 58.14 63.27 68.67 75.66 47.55 50.02
SuperFlow ViT-L 48.01 49.95 60.72 65.09 70.01 77.19 49.07 50.67
+LiMoE ViT-L 49.35 51.41 62.07 66.64 71.59 77.85 50.69 51.93

Domain Generalization Study

Method ScriKITTI Rellis-3D SemPOSS SemSTF SynLiDAR DAPS-3D Synth4D
1% 10% 1% 10% Half Full Half Full 1% 10% Half Full 1% 10%
Random 23.81 47.60 38.46 53.60 46.26 54.12 48.03 48.15 19.89 44.74 74.32 79.38 20.22 66.87
PPKT 36.50 51.67 49.71 54.33 50.18 56.00 50.92 54.69 37.57 46.48 78.90 84.00 61.10 62.41
SLiDR 39.60 50.45 49.75 54.57 51.56 55.36 52.01 54.35 42.05 47.84 81.00 85.40 63.10 62.67
+LiMoE 41.48 53.41 51.28 55.21 53.14 56.42 53.16 55.51 43.72 49.57 81.70 85.76 64.69 66.79
Seal 40.64 52.77 51.09 55.03 53.26 56.89 53.46 55.36 43.58 49.26 81.88 85.90 64.50 66.96
SuperFlow 42.70 54.00 52.83 55.71 54.41 57.33 54.72 56.57 44.85 51.38 82.43 86.21 65.31 69.43
+LiMoE 43.95 55.96 53.74 56.67 55.42 57.83 55.60 57.31 45.79 52.27 83.24 86.68 66.54 71.07

Expert Activation Paths

paths
Visual interpretations of the expert activation paths in Contrastive Mixture Learning (CML). The experts are #1 range view, #2 voxel, and #3 point, respectively.

Point-Wise Top-1 Activation

activation1
Point-wise top-1 activation path in the Semantic Mixture Supervision (SMS) stage. It highlights the most activated representation for each point during the SMS stage, illustrating how different representations contribute to semantic segmentation based on spatial and object-specific characteristics. Best viewed in colors.

Out-of-Distribution 3D Robustness

# Method mCE mRR Fog Rain Snow Blur Beam Cross Echo Sensor Avg
Full Random 112.20 72.57 62.96 70.65 55.48 51.71 62.01 31.56 59.64 39.41 54.18
PPKT 105.64 75.87 64.01 72.18 59.08 57.17 63.88 36.34 60.59 39.57 56.60
SLidR 106.08 75.99 65.41 72.31 56.01 56.07 62.87 41.94 61.16 38.90 56.83
+LiMoE 101.74 77.77 67.92 73.25 57.02 56.30 64.72 44.81 61.23 45.37 58.83
Seal 92.63 83.08 72.66 74.31 66.22 66.14 65.96 57.44 59.87 39.85 62.81
SuperFlow 91.67 83.17 70.32 75.77 65.41 61.05 68.09 60.02 58.36 50.41 63.68
+LiMoE 88.43 83.28 71.10 75.92 65.66 63.86 68.52 60.78 61.91 50.66 64.80
LP PPKT 183.44 78.15 30.65 35.42 28.12 29.21 32.82 19.52 28.01 20.71 28.06
SLidR 179.38 77.18 34.88 38.09 32.64 26.44 33.73 20.81 31.54 21.44 29.95
+LiMoE 163.75 75.49 37.29 43.41 36.04 38.33 40.66 22.46 37.61 25.38 35.15
Seal 166.18 75.38 37.33 42.77 29.93 37.73 40.32 20.31 37.73 24.94 33.88
SuperFlow 161.78 75.52 37.59 43.42 37.60 39.57 41.40 23.64 38.03 26.69 35.99
+LiMoE 155.77 78.23 40.35 45.28 39.14 42.10 44.21 27.33 39.20 29.49 38.39

Cosine Similarity

heatmaps
Cosine similarity between learned features of a query point (denoted as the red dot) and: (1) the features of the image of the same scene (the first row); and (2) the features of the LiDAR points projected onto the image (the second row). Best viewed in colors.

Qualitative Assessment

qualitative1
Qualitative assessments of state-of-the-art pretraining methods, pretrained on nuScenes and fine-tuned on SemanticKITTI with 1% annotations. The error maps depict correct and incorrect predictions in gray and red, respectively. Best viewed in colors.

License

This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses.

Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.

Citation

If you find this work helpful for your research, please kindly consider citing our paper:

@article{xu2025limoe,
    title = {LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes},
    author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Pan, Liang and Liu, Ziwei and Liu, Qingshan},
    journal = {arXiv preprint arXiv:2501.04004},
    year = {2025}
}

Acknowledgments

This work is developed based on the MMDetection3D codebase.


MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.

We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16FRNet, 17SuperFlow, 18torchsparse, 19Conv-LoRA, 20MoE-LLaVA. 💟

About

[CVPR'25] LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •