Skip to content

Commit 921631d

Browse files
committed
init
0 parents  commit 921631d

File tree

158 files changed

+120911
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

158 files changed

+120911
-0
lines changed

.gitignore

+161
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# big files
2+
data_util/face_tracking/3DMM/01_MorphableModel.mat
3+
data_util/face_tracking/3DMM/3DMM_info.npy
4+
data_util/BFM_models/BFM_model_front.mat
5+
!/data_util/BFM_models/.gitkeep
6+
deep_3drecon/BFM/Exp_Pca.bin
7+
deep_3drecon/BFM/01_MorphableModel.mat
8+
deep_3drecon/BFM/BFM_model_front.mat
9+
deep_3drecon/network/FaceReconModel.pb
10+
.vscode
11+
### Project ignore
12+
/checkpoints/*
13+
!/checkpoints/.gitkeep
14+
/data/*
15+
!/data/.gitkeep
16+
!data/raw/videos/May.mp4
17+
!data/raw/val_wavs/zozo.wav
18+
infer_out
19+
rsync
20+
.idea
21+
.DS_Store
22+
bak
23+
tmp
24+
*.tar.gz
25+
mos
26+
nbs
27+
/configs_usr/*
28+
!/configs_usr/.gitkeep
29+
/egs_usr/*
30+
!/egs_usr/.gitkeep
31+
/rnnoise
32+
#/usr/*
33+
#!/usr/.gitkeep
34+
scripts_usr
35+
36+
# Created by .ignore support plugin (hsz.mobi)
37+
### Python template
38+
# Byte-compiled / optimized / DLL files
39+
__pycache__/
40+
*.py[cod]
41+
*$py.class
42+
43+
# C extensions
44+
*.so
45+
46+
# Distribution / packaging
47+
.Python
48+
build/
49+
develop-eggs/
50+
dist/
51+
downloads/
52+
eggs/
53+
.eggs/
54+
lib/
55+
lib64/
56+
parts/
57+
sdist/
58+
var/
59+
wheels/
60+
pip-wheel-metadata/
61+
share/python-wheels/
62+
*.egg-info/
63+
.installed.cfg
64+
*.egg
65+
MANIFEST
66+
67+
# PyInstaller
68+
# Usually these files are written by a python script from a template
69+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
70+
*.manifest
71+
*.spec
72+
73+
# Installer logs
74+
pip-log.txt
75+
pip-delete-this-directory.txt
76+
77+
# Unit test / coverage reports
78+
htmlcov/
79+
.tox/
80+
.nox/
81+
.coverage
82+
.coverage.*
83+
.cache
84+
nosetests.xml
85+
coverage.xml
86+
*.cover
87+
.hypothesis/
88+
.pytest_cache/
89+
90+
# Translations
91+
*.mo
92+
*.pot
93+
94+
# Django stuff:
95+
*.log
96+
local_settings.py
97+
db.sqlite3
98+
db.sqlite3-journal
99+
100+
# Flask stuff:
101+
instance/
102+
.webassets-cache
103+
104+
# Scrapy stuff:
105+
.scrapy
106+
107+
# Sphinx documentation
108+
docs/_build/
109+
110+
# PyBuilder
111+
target/
112+
113+
# Jupyter Notebook
114+
.ipynb_checkpoints
115+
116+
# IPython
117+
profile_default/
118+
ipython_config.py
119+
120+
# pyenv
121+
.python-version
122+
123+
# pipenv
124+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
125+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
126+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
127+
# install all needed dependencies.
128+
#Pipfile.lock
129+
130+
# celery beat schedule file
131+
celerybeat-schedule
132+
133+
# SageMath parsed files
134+
*.sage.py
135+
136+
# Environments
137+
.env
138+
.venv
139+
env/
140+
venv/
141+
ENV/
142+
env.bak/
143+
venv.bak/
144+
145+
# Spyder project settings
146+
.spyderproject
147+
.spyproject
148+
149+
# Rope project settings
150+
.ropeproject
151+
152+
# mkdocs documentation
153+
/site
154+
155+
# mypy
156+
.mypy_cache/
157+
.dmypy.json
158+
dmypy.json
159+
160+
# Pyre type checker
161+
.pyre/

README-zh.md

+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23
2+
3+
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%3CCOLOR%3E.svg)](https://arxiv.org/abs/2301.13430)| [![GitHub Stars](https://img.shields.io/github/stars/yerfor/GeneFace)](https://github.com/yerfor/SyntaSpeech) | ![visitors](https://visitor-badge.glitch.me/badge?page_id=yerfor/GeneFace)
4+
5+
这个仓库是我们[ICLR-2023论文](https://arxiv.org/abs/2301.13430)的官方PyTorch实现,我们在其中提出了**GeneFace** 算法,用于通用和高保真的音频驱动的虚拟人视频合成。
6+
7+
<p align="center">
8+
<br>
9+
<img src="assets/GeneFace.png" width="1000"/>
10+
<br>
11+
</p>
12+
13+
我们的GeneFace对域外音频(如不同说话人、不同语种的音频)实现了更好的嘴唇同步和表现力。推荐您观看[此视频](https://geneface.github.io/GeneFace/example_show_improvement.mp4),以了解GeneFace与之前基于NeRF的虚拟人合成方法的口型同步能力对比。您也可以访问我们的[项目页面](https://geneface.github.io/)以了解更多详细信息。
14+
15+
## Quick Started!
16+
17+
我们提供[预训练的GeneFace模型](https://drive.google.com/drive/folders/1L87ZuvC3BOPdWZ7fALdUKYcIt4pWXtDz?usp=share_link),以便您能快速上手。如果您想在您自己的目标人物视频上训练GeneFace,请遵循 `docs/prepare_env``docs/process_data``docs/train_models` 中的步骤。
18+
19+
步骤1:我们在[这个链接](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link)上提供了预先训练好的Audio2motion模型(上图中的Variational Motion Generator),您可以下载它并将其放在 `checkpoints/lrs3/lm3d_vae`
20+
21+
步骤2:我们在[这个链接](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link)上提供了预先训练好的Post-net (上图中的Domain Adaptative Post-net ),这个模型在 ` data/raw/videos/May.mp4` 上预训练。 您可以下载它并将其放在 `checkpoints/May/postnet`
22+
23+
Step3. 我们在[这个链接](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link)上提供了预先训练好的NeRF (上图中的3DMM NeRF Renderer) ,这个模型在 ` data/raw/videos/May.mp4` 上预训练。您可以下载它并将其放在 `checkpoints/May/lm3d_nerf` and `checkpoints/May/lm3d_nerf_torso`
24+
25+
做完上面的步骤后,您的 `checkpoints`文件夹的结构应该是这样的:
26+
27+
```
28+
> checkpoints
29+
> lrs3
30+
> lm3d_vae
31+
> syncnet
32+
> May
33+
> postnet
34+
> lm3d_nerf
35+
> lm3d_nerf_torso
36+
37+
```
38+
39+
Step4. 在终端中执行以下命令:
40+
41+
```
42+
bash scripts/infer_postnet.sh
43+
bash scripts/infer_lm3d_nerf.sh
44+
```
45+
46+
你能在以下路径找到输出的视频 `infer_out/May/pred_video/zozo.mp4`.
47+
48+
## 搭建环境
49+
50+
请参照该文件夹中的步骤 `docs/prepare_env`.
51+
52+
## 准备数据
53+
54+
请参照该文件夹中的步骤 `docs/process_data`.
55+
56+
## 训练模型
57+
58+
请参照该文件夹中的步骤 `docs/train_models`.
59+
60+
# 在其他目标人物视频上训练GeneFace
61+
62+
除了本仓库中提供的 `May.mp4`,我们还提供了8个实验中使用的目标人物视频。你可以从[这个链接](https://drive.google.com/drive/folders/1FwQoBd1ZrBJMrJE3ZzlNhK8xAe1OYGjX?usp=share_link)下载。
63+
64+
要训练一个名为 <`video_id>.mp4`的新视频,你应该把它放在 `data/raw/videos/`目录下,然后在 `egs/datasets/videos/<video_id>`目录下创建一个新文件夹,并根据提供的示例文件夹 `egs/datasets/videos/May`添加对应的yaml配置文件。
65+
66+
除了使用我们提供的视频进行训练完,您还可以自己录制视频,为自己训练一个独一无二的GeneFace虚拟人模型!
67+
68+
# 待办事项
69+
70+
GeneFace使用3D人脸关键点作为语音转动作模块和运动转图像模块之间的中介。但是,由Post-net生成的3D人脸关键点序列有时会出现不好的情况(如时序上的抖动,或超大的嘴巴),进而影响NeRF渲染的视频质量。目前,我们通过对预测的人脸关键点序列进行后处理,部分缓解了这一问题。但是目前的后处理方法还是略显简易,不能完美解决所有bad case。因此我们鼓励大家提出更好的后处理方法。
71+
72+
## 引用我们的论文
73+
74+
```
75+
@article{ye2023geneface,
76+
title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
77+
author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
78+
journal={arXiv preprint arXiv:2301.13430},
79+
year={2023}
80+
}
81+
```
82+
83+
## 致谢
84+
85+
本工作受到以下仓库的影响:
86+
87+
* [NATSpeech](https://github.com/NATSpeech/NATSpeech) (参考了其中的代码框架)
88+
* [AD-NeRF](https://github.com/YudongGuo/AD-NeRF) (参考了NeRF相关的代码实现)
89+
* [style_avatar](https://github.com/wuhaozhe/style_avatar) (参考了3DMM相关的代码实现)

README.md

+88
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23
2+
3+
#### Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance
4+
5+
[![arXiv](https://img.shields.io/badge/arXiv-Paper-%3CCOLOR%3E.svg)](https://arxiv.org/abs/2301.13430)| [![GitHub Stars](https://img.shields.io/github/stars/yerfor/GeneFace)](https://github.com/yerfor/SyntaSpeech) | ![visitors](https://visitor-badge.glitch.me/badge?page_id=yerfor/GeneFace) | [中文文档](README-zh.md)
6+
7+
This repository is the official PyTorch implementation of our [ICLR-2023 paper](https://arxiv.org/abs/2301.13430)\, in which we propose **GeneFace** for generalized and high-fidelity audio-driven talking face generation. The inference pipeline is as follows:
8+
9+
<p align="center">
10+
<br>
11+
<img src="assets/GeneFace.png" width="1000"/>
12+
<br>
13+
</p>
14+
15+
Our GeneFace achieves better lip synchronization and expressiveness to out-of-domain audios. Watch [this video](https://geneface.github.io/GeneFace/example_show_improvement.mp4) for a clear lip-sync comparison against previous NeRF-based methods. You can also visit our [project page](https://geneface.github.io/) for more details.
16+
17+
## Quick Start!
18+
19+
We provide [pre-trained models](https://drive.google.com/drive/folders/1L87ZuvC3BOPdWZ7fALdUKYcIt4pWXtDz?usp=share_link) of GeneFace to enable a quick start. If you want to train GeneFace on your own target person video, please follow the guided in `docs/prepare_env``docs/process_data``docs/train_models` .
20+
21+
Step1. We provide the pre-trained Audio2motion model (Variational Motion Generator in the figure above) at [this link](https://drive.google.com/drive/folders/1qsYYWmyiDnf0v5AAF9EplAaoO6DLxjFd?usp=share_link), you can download it and place it into the directory `checkpoints/lrs3/lm3d_vae`
22+
23+
Step2. We provide the pre-trained Post-net (Domain Adaptative Post-net in the figure above) model for ` data/raw/videos/May.mp4` at [this link](https://drive.google.com/drive/folders/1prLZYmyiMoCeuaBYdTJwFArQbHb_70O5?usp=share_link), you can download it and place it into the directory `checkpoints/May/postnet`
24+
25+
Step3. We provide the pre-trained NeRF (3DMM NeRF Renderer in the figure above) model for ` data/raw/videos/May.mp4` at [this link](https://drive.google.com/drive/folders/1k-uk3Vya1esqozTM_PjntfYGXnqv7VCs?usp=share_link), you can download it and place it into the directory `checkpoints/May/lm3d_nerf` and `checkpoints/May/lm3d_nerf_torso`
26+
27+
After the above steps, the structure of your `checkpoints` directory should look like this:
28+
29+
```
30+
> checkpoints
31+
> lrs3
32+
> lm3d_vae
33+
> syncnet
34+
> May
35+
> postnet
36+
> lm3d_nerf
37+
> lm3d_nerf_torso
38+
```
39+
40+
Step4. Run the scripts below:
41+
42+
```
43+
bash scripts/infer_postnet.sh
44+
bash scripts/infer_lm3d_nerf.sh
45+
```
46+
47+
You can find a output video named `infer_out/May/pred_video/zozo.mp4`.
48+
49+
## Prepare Environments
50+
51+
Please follow the steps in `docs/prepare_env`.
52+
53+
## Prepare Datasets
54+
55+
Please follow the steps in `docs/process_data`.
56+
57+
## Train Models
58+
59+
Please follow the steps in `docs/train_models`.
60+
61+
# Train GeneFace on other target person videos
62+
63+
Apart from the `May.mp4` provided in this repo, we also provide 8 target person videos that were used in our experiments. You can download them at [this link](https://drive.google.com/drive/folders/1FwQoBd1ZrBJMrJE3ZzlNhK8xAe1OYGjX?usp=share_link). To train on a new video named `<video_id>.mp4`, you should place it into the `data/raw/videos/` directory, then create a new folder at `egs/datasets/videos/<video_id>` and edit config files, according to the provided example folder `egs/datasets/videos/May`.
64+
65+
You can also record your own video and train a unique GeneFace model for yourself!
66+
67+
# Todo List
68+
69+
GeneFace use 3D landmark as the intermediate between the audio2motion and motion2image mapping. However, the 3D landmark sequence generated by the postnet sometimes have bad cases (such as shaking head, or extra-large mouth) and influence the quality of the rendered video. Currently, we partially alleviate this problem by postprocessing the predicted 3D landmark sequence. We call for better postprocessing methods.
70+
71+
## Citation
72+
73+
```
74+
@article{ye2023geneface,
75+
title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
76+
author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
77+
journal={arXiv preprint arXiv:2301.13430},
78+
year={2023}
79+
}
80+
```
81+
82+
## Acknowledgements
83+
84+
**Our codes are based on the following repos:**
85+
86+
* [NATSpeech](https://github.com/NATSpeech/NATSpeech) (For the code template)
87+
* [AD-NeRF](https://github.com/YudongGuo/AD-NeRF) (For NeRF-related implementation)
88+
* [style_avatar](https://github.com/wuhaozhe/style_avatar) (For 3DMM parameters extraction)

assets/GeneFace.png

146 KB
Loading

checkpoints/.gitkeep

Whitespace-only changes.

data/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)