Skip to content

Planned TODOs #1

Closed
Closed
@r9y9

Description

@r9y9

This is an umbrella issue to track progress for my planned TODOs. Comments and requests are welcome.

Goal

  • achieve higher speech quality than conventional vocoder (WORLD, griffin-lim, etc)
  • provide pre-trained model of WaveNet-based mel-spectrogram vocoder

Model

  • 1D dilated convolution
  • batch forward
  • incremental inference
  • local conditioning
  • global conditioning
  • upsampling network (by transposed convolutions)

Training script

  • Local conditioning
  • Global conditioning
  • Configurable maximum number of time steps (to avoid out of memory error). 58ad07f

Experiments

  • unconditioned WaveNet trained with CMU Arctic
  • conditioning model on mel-spectrogram (local conditioning) with CMU Arctic
  • conditioning model on mel-spectrogram and speaker id with CMU Arctic
  • conditioning model on mel-spectrogram (local conditioning) with LJSpeech
  • DeepVoice3 + WaveNet vocoder WIP: Support for Wavenet vocoder deepvoice3_pytorch#21

Misc

  • [ ] Time sliced data generator?
  • Travis CI
  • Train/val split
  • README

Sampling frequency

  • 4kHz
  • 16kHz
  • 22.5kHz
  • 44.1kHz
  • 48kHz

Advanced (lower priority)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions