Skip to content

🌳 [ICRA'25] Hier-SLAM: Semantic Gaussian Splatting SLAM with Hierarchical Categorical Representation

License

Notifications You must be signed in to change notification settings

LeeBY68/Hier-SLAM

Repository files navigation

Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting (ICRA'25)

Authors: Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, and Hamid Rezatofighi

πŸ“ [Paper]   πŸ“½οΈ [Video]

Logo

We propose 🌳 Hier-SLAM: a ⭐ LLM-assitant ⭐ Fast ⭐ Semantic 3D Gaussian Splatting SLAM method featuring ⭐ a Novel Hierarchical Categorical Representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic prediction in the 3D world.

Hier-SLAM

Getting Start

Installation

Clone the repository and set up the Conda environment:

git clone https://github.com/LeeBY68/Hier-SLAM.git
cd HierSLAM
conda create -n hierslam python=3.10
conda activate hierslam
conda install gcc=10 gxx=10 -c conda-forge
conda install -c "nvidia/label/cuda-11.6.0" cuda-toolkit
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirements.txt

# Compile the Semantic-3DGS:
cd hierslam-diff-gaussian-rasterization-w-depth
pip install ./

Download

Replica

The Replica dataset is a synthetic indoor dataset. Our method uses the same sequences provided by previous works, including NICE-SLAM and iMAP (Same RGB and depth sequences with the same trajectories), to ensure a fair comparison with visual SLAM methods.

Since these sequences do not generate per-frame semantic ground truth, we have rendered and generated the semantic ground truth from the synthetic Replica dataset.

  • To automatically download the Replica RGBD sequences, run the following script to download the data originally generated via NICE-SLAM:

    bash bash_scripts/download_replica.sh
    
  • Download the corresponding per-frame semantic ground truth we rendered from the following link: πŸ“₯ Replica_Semantic_Tree

  • The generated hierarchical tree file info_semantic_tree.json, located under the Replica directory. The tree is created based on the entire set of semantic classes in the Replica dataset (info_semantic.json: provided by official Replica). Copy info_semantic_tree.json into each sequence folder.

  • After downloading the RGB & depth & poses & semantic gt, and the tree file, the final directory structure for Replica should be as follows (click to expand):

    [Final Replica Structure]
      DATAROOT
      └── Replica
            └── room0
                β”œβ”€β”€ results
                β”‚   β”œβ”€β”€ depth000000.png
                β”‚   β”œβ”€β”€ depth000001.png
                β”‚   β”œβ”€β”€ ...
                β”‚   └── ...    
                β”‚   β”œβ”€β”€ frame000000.png
                β”‚   β”œβ”€β”€ frame000001.png
                β”‚   β”œβ”€β”€ ...
                β”‚   └── ...            
                β”œβ”€β”€ semantic_class
                β”‚   β”œβ”€β”€ semantic_class_0.png
                β”‚   β”œβ”€β”€ semantic_class_1.png
                β”‚   β”œβ”€β”€ ...
                β”‚   └── ... 
                └── traj.txt 
                └── info_semantic_tree.json 
    

ScanNet

The ScanNet dataset is a real-world RGB-D video dataset.

  • To use it, follow the official data download procedure provided on the ScanNet website. After downloading, extract the color and depth frames from the .sens files using the provided reader script.

  • For semantic labels, we use the label-filt files due to their higher quality:

    unzip [seq_name]_2d-label-filt.zip
    
  • We provide the generated hierarchical tree files for ScanNet, which are all generated from the semantic classes in the ScanNet dataset (scannetv2-labels.combined.tsv: Provided by official ScanNet):

    • scannetv2-labels.combined.tree.tsv: A tree generated based on the NYU40 semantic classes.

    • scannetv2-labels.combined.tree-large.tsv: A large tree generated based on the full set of original ScanNet semantic classes, covering up to 550 unique labels, derived from the 'id' and 'category' columns in scannetv2-labels.combined.tsv.

    You can download both hierarchical tree files from the following link: πŸ“₯ ScanNet_Tree

  • After downloading the RGB & depth & poses & semantics and the tree file, the final directory structure for ScanNet should be as follows (click to expand):

    [Final ScanNet Structure]
      DATAROOT
      └── scannet
            └── scene0000_00
                β”œβ”€β”€ color
                β”‚   β”œβ”€β”€ 0.jpg
                β”‚   β”œβ”€β”€ 1.jpg
                β”‚   β”œβ”€β”€ ...
                β”‚   └── ...    
                β”œβ”€β”€ depth
                β”‚   β”œβ”€β”€ 0.png
                β”‚   β”œβ”€β”€ 1.png
                β”‚   β”œβ”€β”€ ...
                β”‚   └── ...            
                β”œβ”€β”€ label-filt
                β”‚   β”œβ”€β”€ 0.png
                β”‚   β”œβ”€β”€ 1.png
                β”‚   β”œβ”€β”€ ...
                β”‚   └── ... 
                β”œβ”€β”€ intrinsic
                └── pose
                    β”œβ”€β”€ 0.txt
                    β”œβ”€β”€ 1.txt
                    β”œβ”€β”€ ...
                    └── ... 
            β”œβ”€β”€ scanetv2-labels.combined.tree.tsv  
            └── scanetv2-labels.combined.tree-large.tsv    
    

Run

Replica

πŸ”Ή Run Hier-SLAM on the Replica dataset using the default hierarchical semantic setting, use:

python scripts/hierslam.py configs/replica/hierslam_semantic_run.py

You can also try different configurations:

πŸ”Ή Run Hier-SLAM without semantic (Visual-only Hier-SLAM) on Replica, use:

python scripts/hierslam.py configs/replica/hierslam_nosemantic_run.py

πŸ”Ή Run Hier-SLAM with flat semantic encoding (one-hot) :

  • Modify the number of semantic categories in the CUDA config file:
// In hierslam-diff-gaussian-rasterization-w-depth/cuda_rasterizer/config.h
#define NUM_SEMANTIC 102
  • Reinstall the CUDA extension:
conda activate hierslam
cd hierslam-diff-gaussian-rasterization-w-depth
pip install ./
cd ..
  • Run following command:
 python scripts/hierslam.py configs/replica/hierslam_semantic_flat_run.py

ScanNet

πŸ”Ή Run Hier-SLAM on the ScanNet dataset using the default hierarchical semantic setting:

  • Modify the number of semantic categories in the CUDA config file:
// In hierslam-diff-gaussian-rasterization-w-depth/cuda_rasterizer/config.h
#define NUM_SEMANTIC 16
  • Reinstall the CUDA extension:
conda activate hierslam
cd hierslam-diff-gaussian-rasterization-w-depth
pip install ./
cd ..
  • Run following command:
python scripts/hierslam.py configs/scannet/hierslam_semantic_run.py

You can also try different configurations:

πŸ”Ή Run Hier-SLAM without Semantic (Visual-only Hier-SLAM) on ScanNet, use:

python scripts/hierslam.py configs/scannet/hierslam_nosemantic_run.py

πŸ”Ή Run Hier-SLAM with scaling-up semantic encoding :

  • Modify the number of semantic categories in the CUDA config file:
// In hierslam-diff-gaussian-rasterization-w-depth/cuda_rasterizer/config.h
#define NUM_SEMANTIC 74
  • Reinstall the CUDA extension:
conda activate hierslam
cd hierslam-diff-gaussian-rasterization-w-depth
pip install ./
cd ..
  • Run following command:
 python scripts/hierslam.py configs/scannet/hierslam_semantic_large_run.py

Tree generation

Refer to LLM_tree/readme.md for details on tree generation using LLMs.

Evaluation and visualization

πŸ”Έ Once a sequence completes, run following command to evaluate:

python scripts/eval_novel_view.py configs/replica/hierslam_semantic_run.py
  • Subset-classes evaluation: In configs/scannet/hierslam_semantic_run.py, set eval_gt_transfer = True to evaluate only the classes visible in each frame.
  • We recommend using the full set of semantic classes (eval_gt_transfer = False) for a standard semantic evaluation. The subset option is provided to maintain consistency with previous works and ensure fair comparisons.

πŸ”Έ To export the reconstructed global 3D semantic world to a .PLY file by running:

python scripts/export_ply_semantic_tree.py configs/replica/hierslam_semantic_run.py

We recommend using MeshLab or Blender to visualize the resulting PLY files.

πŸ”Έ To visualize the reconstructed semantic map and estimated camera poses, run:

python viz_scripts/online_recon_sem_replica.py configs/replica/hierslam_semantic_run.py --flag_semantic
  • Add --flag_semantic to enable semantic visualization.
  • Omit --flag_semantic to display the RGB reconstruction instead.

Acknowledgement

We thank the authors for releasing code for their awesome works: 3DGS、 SplaTAM、 GauStudio、 Gaussian Grouping、 Feature 3DGS, and many other inspiring works in the community.

Citation

If you find our work useful, please cite:

@inproceedings{li2025hier,
  title={Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting},
  author={Li, Boying and Cai, Zhixi and Li, Yuan-Fang and Reid, Ian and Rezatofighi, Hamid},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2025}
}

About

🌳 [ICRA'25] Hier-SLAM: Semantic Gaussian Splatting SLAM with Hierarchical Categorical Representation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published