1NUAA 2NUS 3Shanghai AI Lab 4NJUPT 5S-Lab, NTU
LiMoE is a framework that integrates the Mixture of Experts (MoE) paradigm into LiDAR data representation learning to synergistically combine multiple representations, such as range images, sparse voxels, and raw points. The framework consists of three stages: i) Image-to-LiDAR pretraining, which transfers prior knowledge from images to point clouds across different representations; ii) Contrastive Mixture Learning (CML), which uses MoE to adaptively activate relevant attributes from each representation and distills these mixed features into a unified 3D network; iii) Semantic Mixture Supervision (SMS), which combines semantic logits from multiple representations to boost downstream segmentation performance.
- [2025.02] - Our paper LiMoE has been accepted to CVPR 2025! 🎉
- [2025.01] - Introducing the 👨👦👦 LiMoE project! For more details, kindly refer to our Project Page and Preprint. 🚀
For details related to installation and environment setups, kindly refer to INSTALL.md.
Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.
To learn more usage about this codebase, kindly refer to GET_STARTED.md.
Method | Distill | nuScenes | KITTI | Waymo | |||||
---|---|---|---|---|---|---|---|---|---|
LP | 1% | 5% | 10% | 25% | Full | 1% | 1% | ||
Random | - | 8.10 | 30.30 | 47.84 | 56.15 | 65.48 | 74.66 | 39.50 | 39.41 |
SLiDR | ViT-S | 44.70 | 41.16 | 53.65 | 61.47 | 66.71 | 74.20 | 44.67 | 47.57 |
+LiMoE | ViT-S | 45.80 | 46.82 | 57.54 | 63.85 | 68.61 | 75.64 | 46.81 | 48.81 |
Seal | ViT-S | 45.16 | 44.27 | 55.13 | 62.46 | 67.64 | 75.58 | 46.51 | 48.67 |
SuperFlow | ViT-S | 46.44 | 47.81 | 59.44 | 64.47 | 69.20 | 76.54 | 47.97 | 49.94 |
+LiMoE | ViT-S | 48.20 | 49.60 | 60.54 | 65.65 | 71.39 | 77.27 | 49.53 | 51.42 |
SLiDR | ViT-B | 45.35 | 41.64 | 55.83 | 62.68 | 67.61 | 74.98 | 45.50 | 48.32 |
+LiMoE | ViT-B | 46.56 | 46.89 | 58.09 | 63.87 | 69.02 | 75.87 | 47.96 | 49.50 |
Seal | ViT-B | 46.59 | 45.98 | 57.15 | 62.79 | 68.18 | 75.41 | 47.24 | 48.91 |
SuperFlow | ViT-S | 47.66 | 48.09 | 59.66 | 64.52 | 69.79 | 76.57 | 48.40 | 50.20 |
+LiMoE | ViT-B | 49.07 | 50.23 | 61.51 | 66.17 | 71.56 | 77.81 | 50.30 | 51.77 |
SLiDR | ViT-L | 45.70 | 42.77 | 57.45 | 63.20 | 68.13 | 75.51 | 47.01 | 48.60 |
+LiMoE | ViT-L | 47.43 | 46.92 | 58.41 | 64.54 | 69.69 | 76.32 | 48.25 | 50.23 |
Seal | ViT-L | 46.81 | 46.27 | 58.14 | 63.27 | 68.67 | 75.66 | 47.55 | 50.02 |
SuperFlow | ViT-L | 48.01 | 49.95 | 60.72 | 65.09 | 70.01 | 77.19 | 49.07 | 50.67 |
+LiMoE | ViT-L | 49.35 | 51.41 | 62.07 | 66.64 | 71.59 | 77.85 | 50.69 | 51.93 |
Method | ScriKITTI | Rellis-3D | SemPOSS | SemSTF | SynLiDAR | DAPS-3D | Synth4D | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1% | 10% | 1% | 10% | Half | Full | Half | Full | 1% | 10% | Half | Full | 1% | 10% | |
Random | 23.81 | 47.60 | 38.46 | 53.60 | 46.26 | 54.12 | 48.03 | 48.15 | 19.89 | 44.74 | 74.32 | 79.38 | 20.22 | 66.87 |
PPKT | 36.50 | 51.67 | 49.71 | 54.33 | 50.18 | 56.00 | 50.92 | 54.69 | 37.57 | 46.48 | 78.90 | 84.00 | 61.10 | 62.41 |
SLiDR | 39.60 | 50.45 | 49.75 | 54.57 | 51.56 | 55.36 | 52.01 | 54.35 | 42.05 | 47.84 | 81.00 | 85.40 | 63.10 | 62.67 |
+LiMoE | 41.48 | 53.41 | 51.28 | 55.21 | 53.14 | 56.42 | 53.16 | 55.51 | 43.72 | 49.57 | 81.70 | 85.76 | 64.69 | 66.79 |
Seal | 40.64 | 52.77 | 51.09 | 55.03 | 53.26 | 56.89 | 53.46 | 55.36 | 43.58 | 49.26 | 81.88 | 85.90 | 64.50 | 66.96 |
SuperFlow | 42.70 | 54.00 | 52.83 | 55.71 | 54.41 | 57.33 | 54.72 | 56.57 | 44.85 | 51.38 | 82.43 | 86.21 | 65.31 | 69.43 |
+LiMoE | 43.95 | 55.96 | 53.74 | 56.67 | 55.42 | 57.83 | 55.60 | 57.31 | 45.79 | 52.27 | 83.24 | 86.68 | 66.54 | 71.07 |
![]() |
---|
Visual interpretations of the expert activation paths in Contrastive Mixture Learning (CML). The experts are #1 range view, #2 voxel, and #3 point, respectively. |
# | Method | mCE | mRR | Fog | Rain | Snow | Blur | Beam | Cross | Echo | Sensor | Avg |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Full | Random | 112.20 | 72.57 | 62.96 | 70.65 | 55.48 | 51.71 | 62.01 | 31.56 | 59.64 | 39.41 | 54.18 |
PPKT | 105.64 | 75.87 | 64.01 | 72.18 | 59.08 | 57.17 | 63.88 | 36.34 | 60.59 | 39.57 | 56.60 | |
SLidR | 106.08 | 75.99 | 65.41 | 72.31 | 56.01 | 56.07 | 62.87 | 41.94 | 61.16 | 38.90 | 56.83 | |
+LiMoE | 101.74 | 77.77 | 67.92 | 73.25 | 57.02 | 56.30 | 64.72 | 44.81 | 61.23 | 45.37 | 58.83 | |
Seal | 92.63 | 83.08 | 72.66 | 74.31 | 66.22 | 66.14 | 65.96 | 57.44 | 59.87 | 39.85 | 62.81 | |
SuperFlow | 91.67 | 83.17 | 70.32 | 75.77 | 65.41 | 61.05 | 68.09 | 60.02 | 58.36 | 50.41 | 63.68 | |
+LiMoE | 88.43 | 83.28 | 71.10 | 75.92 | 65.66 | 63.86 | 68.52 | 60.78 | 61.91 | 50.66 | 64.80 | |
LP | PPKT | 183.44 | 78.15 | 30.65 | 35.42 | 28.12 | 29.21 | 32.82 | 19.52 | 28.01 | 20.71 | 28.06 |
SLidR | 179.38 | 77.18 | 34.88 | 38.09 | 32.64 | 26.44 | 33.73 | 20.81 | 31.54 | 21.44 | 29.95 | |
+LiMoE | 163.75 | 75.49 | 37.29 | 43.41 | 36.04 | 38.33 | 40.66 | 22.46 | 37.61 | 25.38 | 35.15 | |
Seal | 166.18 | 75.38 | 37.33 | 42.77 | 29.93 | 37.73 | 40.32 | 20.31 | 37.73 | 24.94 | 33.88 | |
SuperFlow | 161.78 | 75.52 | 37.59 | 43.42 | 37.60 | 39.57 | 41.40 | 23.64 | 38.03 | 26.69 | 35.99 | |
+LiMoE | 155.77 | 78.23 | 40.35 | 45.28 | 39.14 | 42.10 | 44.21 | 27.33 | 39.20 | 29.49 | 38.39 |
This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be with other licenses.
Kindly refer to LICENSE.md for a more careful check, if you are using our code for commercial matters.
If you find this work helpful for your research, please kindly consider citing our paper:
@article{xu2025limoe,
title = {LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes},
author = {Xu, Xiang and Kong, Lingdong and Shuai, Hui and Pan, Liang and Liu, Ziwei and Liu, Qingshan},
journal = {arXiv preprint arXiv:2501.04004},
year = {2025}
}
This work is developed based on the MMDetection3D codebase.
MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D perception. It is a part of the OpenMMLab project developed by MMLab.
We acknowledge the use of the following public resources during the couuse of this work: 1nuScenes, 2nuScenes-devkit, 3SemanticKITTI, 4SemanticKITTI-API, , 5WaymoOpenDataset, 6Synth4D, 7ScribbleKITTI, 8RELLIS-3D, 9SemanticPOSS, 10SemanticSTF, 11SynthLiDAR, 12DAPS-3D, 13Robo3D, 14SLidR, 15DINOv2, 16FRNet, 17SuperFlow, 18torchsparse, 19Conv-LoRA, 20MoE-LLaVA. 💟