Usage

Important: we released CraftsMan3D-DoraVAE trained using rectified flow.

中文版

CraftsMan3D: High-fidelity Mesh Generation
with 3D Native Generation and Interactive Geometry Refiner

Weiyu Li^1,2, Jiarui Liu^1,2, Hongyu Yan^*1, Rui Chen¹, Yixun Liang^1,2, Xuelin Chen³, Ping Tan^1,2, Xiaoxiao Long^1,2

¹HKUST, ²LightIllusions, ³Adobe Research

Local Website

Usage

from craftsman import CraftsManPipeline
import torch

# load from local ckpt
# mkdir ckpts && cd ckpts
# mkdir craftsman-DoraVAE && cd craftsman-DoraVAE
# wget https://pub-c7137d332b4145b6b321a6c01fcf8911.r2.dev/craftsman-DoraVAE/config.yaml
# wget https://pub-c7137d332b4145b6b321a6c01fcf8911.r2.dev/craftsman-DoraVAE/model.ckpt
# pipeline = CraftsManPipeline.from_pretrained("./ckpts/craftsman-DoraVAE", device="cuda:0", torch_dtype=torch.bfloat16) 

# load from huggingface model hub, I uploading...
pipeline = CraftsManPipeline.from_pretrained("craftsman3d/craftsman-DoraVAE", device="cuda:0", torch_dtype=torch.bfloat16)

# inference
mesh = pipeline("https://pub-f9073a756ec645d692ce3d171c2e1232.r2.dev/data/werewolf.png").meshes[0]
mesh.export("werewolf.obj")

The results should be like this:

TL; DR: CraftsMan3D (aka 匠心) is a two-stage text/image to 3D mesh generation model. By mimicking the modeling workflow of artist/craftsman, we propose to generate a coarse mesh (5s) with smooth geometry using 3D diffusion model and then refine it (20s) using enhanced multi-view normal maps generated by 2D normal diffusion, which is also can be in a interactive manner like Zbrush.

✨ Overview

This repo contains source code (training / inference) of 3D diffusion model, pretrained weights and gradio demo code of our 3D mesh generation project, you can find more visualizations on our project page and try our demo. If you have high-quality 3D data or some other ideas, we very much welcome any form of cooperation.

Full abstract here

We present a novel generative 3D modeling system, coined CraftsMan, which can generate high-fidelity 3D geometries with highly varied shapes, regular mesh topologies, and detailed surfaces, and, notably, allows for refining the geometry in an interactive manner. Despite the significant advancements in 3D generation, existing methods still struggle with lengthy optimization processes, irregular mesh topologies, noisy surfaces, and difficulties in accommodating user edits, consequently impeding their widespread adoption and implentation in 3D modeling softwares. Our work is inspired by the craftsman, who usually roughs out the holistic figure of the work first and elaborate the surface details subsequently. Specifically, we employ a 3D native diffusion model, which operates on latent space learned from latent set-based 3D representations, to generate coarse geometries with regular mesh topology in seconds. In particular, this process takes as input a text prompt or a reference image, and leverages a powerful multi-view (MV) diffusion model to generates multiple views of the coarse geometry, which are fed into our MV-conditioned 3D diffusion model for generating the 3D geometry, significantly improving robustness and generalizability. Following that, a normal-based geometry refiner is used to significantly enhance the surface details. This refinement can be performed automatically, or interactively with user-supplied edits. Extensive experiments demonstrate that our method achieves high efficiency in producing superior quality 3D assets compared to existing methods.

💪 ToDo List

Environment Setup

Hardware

We train our model on 32x A800 GPUs with a batch size of 32 per GPU for 7 days.

The mesh refinement part is performed on a GTX 3080 GPU.

Setup environment

😃 We also provide a Dockerfile for easy installation, see Setup using Docker.

Python 3.10.0
PyTorch 2.5.1 (for RSMNorm)
Cuda Toolkit 12.4.0
Ubuntu 22.04

Clone this repository.

git clone https://github.com/wyysf-98/CraftsMan.git

Install the required packages.

conda create -n CraftsMan python=3.10 -y
conda activate CraftsMan
# conda install -c "nvidia/label/cuda-12.1.1" cudatoolkit
# conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install torch==2.5.1 torchvision==0.20.1
pip install -r docker/requirements.txt
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu124.html

✨ History

This repo will port some recent techniques for 3D diffusion model and the history version like the arxiv version can be found in different branch.

🎥 Video

3D Native DiT Model (Latent Set DiT Model)

We provide the training and the inference code here for future research. The latent set VAE model is heavily build on the same structure of Michelangelo. The latent set diffusion model is based on a DiT/Pixart-alpha and with 500M parameters.

Pretrained models

Currently, We provide the models with single view image as condition with DiT. We will consider open source the further models according to the real situation. If you run the inference.py without specifying the model path, it will automatically download the model from the huggingface model hub.

Or you can download the model manually:

## you can just manually get the model using wget:
mkdir ckpts
cd ckpts
mkdir craftsman-v1-5
cd craftsman-v1-5
wget https://huggingface.co/craftsman3d/craftsman/resolve/main/config.yaml
wget https://huggingface.co/craftsman3d/craftsman/resolve/main/model.ckpt
### for DoraVAE version(https://aruichen.github.io/Dora/)
cd ..
mkdir craftsman-doravae
cd craftsman-doravae
wget https://huggingface.co/craftsman3d/craftsman-doravae/resolve/main/config.yaml
wget https://huggingface.co/craftsman3d/craftsman-doravae/resolve/main/model.ckpt

## OR you can git clone the repo:
git lfs install
git clone https://huggingface.co/craftsman3d/craftsman
### for DoraVAE version(https://aruichen.github.io/Dora/)
git clone https://huggingface.co/craftsman3d/craftsman-doravae

If you download the models using wget, you should manually put them under the ckpts/craftsman directory.

Gradio demo

We provide gradio demos for easy usage.

python gradio_app.py --model_path ./ckpts/craftsman

Inference

To generate 3D meshes from images folders via command line, simply run:

python inference.py --input eval_data --device 0 --model ./ckpts/craftsman

For more configs, please refer to the inference.py.

Train from scratch

We provide our training code to facilitate future research. And we provide a data sample in data. 100k data sample for VAE training can be downloaded from (to be uploaded)

100k data sample for diffusion training can be downloaded from https://pub-c7137d332b4145b6b321a6c01fcf8911.r2.dev/Objaverse_100k.zip

selected 190k UUID for training can be downloaded from https://pub-c7137d332b4145b6b321a6c01fcf8911.r2.dev/objaverse_190k.json

selected 320k UUID for training can be downloaded from https://pub-c7137d332b4145b6b321a6c01fcf8911.r2.dev/objaverse_320k.json

For more training details and configs, please refer to the configs folder.

### training the shape-autoencoder
python train.py --config ./configs/shape-autoencoder/michelangelo-l768-e64-ne8-nd16.yaml \
                 --train --gpu 0

### training the image-to-shape diffusion model
# for single view conditioned generation
python train.py --config ./configs/image-to-shape-diffusion/clip-dinov2-pixart-diffusion-dit32.yaml --train --gpu 0

# for multi view conditioned generation (original paper)
python train.py --config ./configs/image-to-shape-diffusion/clip-mvrgb-modln-l256-e64-ne8-nd16-nl6.yaml --train --gpu 0

# for DoraVAE single view diffusion version (We can not provide the data for you due to the license issue, you can processed it by yourself)
# (https://github.com/Seed3D/Dora/tree/main/sharp_edge_sampling)
python train.py --config ./configs/image-to-shape-diffusion/DoraVAE-dinov2reglarge518-pixart-rectified-flow-dit32.yaml --train --gpu 0

❓Common questions

Q: Tips to get better results. 0. Due to limited resources, we will gradually expand the dataset and training scale, and therefore we will release more pre-trained models in the future.

Just like the 2D diffusion model, try different seeds, adjust the CFG scale or different scheduler. Good Luck.
We will provide a version that conditioned on the text prompt, so you can use some positive and negative prompts.

🤗 Acknowledgements

Thanks to LightIllusion for providing computational resources and Jianxiong Pan for data preprocessing. If you have any idea about high-quality 3D Generation, welcome to contact us!
Thanks to Hugging Face for sponsoring the nicely demo!
Thanks to 3DShape2VecSet for their amazing work, the latent set representation provides an efficient way to represent 3D shape!
Thanks to Michelangelo for their great work, our model structure is heavily build on this repo!
Thanks to CRM, Wonder3D and LGM for their released model about multi-view images generation. If you have a more advanced version and want to contribute to the community, we are welcome to update.
Thanks to Objaverse, Objaverse-MIX for their open-sourced data, which help us to do many validation experiments.
Thanks to ThreeStudio for their great repo, we follow their fantastic and easy-to-use code structure!
Thanks to Direct3D especially Shuang Wu for providing their results.
Thanks to TripoSG and Hunyuan3D-2 for their open-source, we adapted our code to support loading their weights, training, and fine-tuning.

📑License

CraftsMan3D is under MIT License.

📖 BibTeX

@misc{li2024craftsman,
title         = {CraftsMan3D: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner}, 
author        = {Weiyu Li and Jiarui Liu and Hongyu Yan and Rui Chen and Yixun Liang and Xuelin Chen and Ping Tan and Xiaoxiao Long},
year          = {2024},
archivePrefix = {arXiv preprint arXiv:2405.14979},
primaryClass  = {cs.CG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
asset		asset
configs		configs
craftsman		craftsman
data/validation/images		data/validation/images
docker		docker
val_data		val_data
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
gradio_app.py		gradio_app.py
inference.py		inference.py
train.py		train.py
train_autoencoder.sh		train_autoencoder.sh
train_diffusion.sh		train_diffusion.sh
watertight_and_sampling.py		watertight_and_sampling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Important: we released CraftsMan3D-DoraVAE trained using rectified flow.

CraftsMan3D: High-fidelity Mesh Generation
with 3D Native Generation and Interactive Geometry Refiner

Weiyu Li^1,2, Jiarui Liu^1,2, Hongyu Yan^*1, Rui Chen¹, Yixun Liang^1,2, Xuelin Chen³, Ping Tan^1,2, Xiaoxiao Long^1,2

¹HKUST, ²LightIllusions, ³Adobe Research

Usage

✨ Overview

💪 ToDo List

Contents

Environment Setup

✨ History

🎥 Video

3D Native DiT Model (Latent Set DiT Model)

Pretrained models

Gradio demo

Inference

Train from scratch

❓Common questions

🤗 Acknowledgements

📑License

📖 BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

wyysf-98/CraftsMan3D

Folders and files

Latest commit

History

Repository files navigation

Important: we released CraftsMan3D-DoraVAE trained using rectified flow.

CraftsMan3D: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner

Weiyu Li*1,2, Jiarui Liu*1,2, Hongyu Yan*1, Rui Chen1, Yixun Liang1,2, Xuelin Chen3, Ping Tan1,2, Xiaoxiao Long1,2

1HKUST, 2LightIllusions, 3Adobe Research

Usage

✨ Overview

💪 ToDo List

Contents

Environment Setup

✨ History

🎥 Video

3D Native DiT Model (Latent Set DiT Model)

Pretrained models

Gradio demo

Inference

Train from scratch

❓Common questions

🤗 Acknowledgements

📑License

📖 BibTeX

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

CraftsMan3D: High-fidelity Mesh Generation
with 3D Native Generation and Interactive Geometry Refiner

Weiyu Li^1,2, Jiarui Liu^1,2, Hongyu Yan^*1, Rui Chen¹, Yixun Liang^1,2, Xuelin Chen³, Ping Tan^1,2, Xiaoxiao Long^1,2

¹HKUST, ²LightIllusions, ³Adobe Research

Packages