Closed
Description
This is an umbrella issue to track progress for my planned TODOs. Comments and requests are welcome.
Goal
- achieve higher speech quality than conventional vocoder (WORLD, griffin-lim, etc)
- provide pre-trained model of WaveNet-based mel-spectrogram vocoder
Model
- 1D dilated convolution
- batch forward
- incremental inference
- local conditioning
- global conditioning
- upsampling network (by transposed convolutions)
Training script
- Local conditioning
- Global conditioning
- Configurable maximum number of time steps (to avoid out of memory error). 58ad07f
Experiments
- unconditioned WaveNet trained with CMU Arctic
- conditioning model on mel-spectrogram (local conditioning) with CMU Arctic
- conditioning model on mel-spectrogram and speaker id with CMU Arctic
- conditioning model on mel-spectrogram (local conditioning) with LJSpeech
- DeepVoice3 + WaveNet vocoder WIP: Support for Wavenet vocoder deepvoice3_pytorch#21
Misc
[ ] Time sliced data generator?- Travis CI
- Train/val split
- README
Sampling frequency
- 4kHz
- 16kHz
- 22.5kHz
- 44.1kHz
- 48kHz
Advanced (lower priority)
- Mixture of logistic distributions Experimental: Mixture of logistic distributions #5
- polyak averaging https://discuss.pytorch.org/t/how-to-apply-exponential-moving-average-decay-for-variables/10856 Experimental: Mixture of logistic distributions #5
- Faster generation
- Parallel WaveNet