Skip to content

Releases: edwko/OuteTTS

OuteTTS Lib v0.4

07 Apr 09:29
Compare
Choose a tag to compare

OuteTTS Lib v0.4 Release Notes

Interface Improvements

  • Consolidated all interface versions into a single interface.py file for centralized management
  • Implemented isolated model handling in separate version folders while maintaining core functionality for cross-compatibility
  • Added Interface Version 3 implementation to support OuteTTS v1.0 models

New Features

  • Smart text chunking for generating long audio clips from large text inputs
  • Added DAC interface code to handle OuteTTS 1.0 audio encoding and decoding
  • Added metadata for interface version compatibility in speaker files

Transformers Backend Patch for OuteTTS 1.0

  • Implemented windowed repetition penalty processor (RepetitionPenaltyLogitsProcessorPatch) for improved text generation quality
  • Applies penalties only to recent tokens (64-token window) rather than full context
  • Addresses key quality issues in speech synthesis applications
  • Maintains backward compatibility with standard HuggingFace interfaces

Streamlined Usage

Simplified code usage with a more modular and compact implementation:

output = interface.generate(
    config=outetts.GenerationConfig(
        text="Hello, how are you doing?",
        generation_type=outetts.GenerationType.CHUNKED,
        speaker=speaker,
        sampler_config=outetts.SamplerConfig(
            temperature=0.4
            # Additional sampler parameters
        ),
    )
)

Automatic Configuration

Added support for automatic config and model loading for v1.0 models:

# Auto-configuration approach
interface = outetts.Interface(
    config=outetts.ModelConfig.auto_config(
        model=outetts.Models.VERSION_1_0_SIZE_1B,
        backend=outetts.Backend.LLAMACPP,
        quantization=outetts.LlamaCppQuantization.FP16
    )
)

Manual configuration remains available:

# Manual configuration approach
interface = outetts.Interface(
    config=outetts.ModelConfig(
        model_path="...",
        tokenizer_path="...",
        backend=outetts.Backend.LLAMACPP,
        interface_version=outetts.InterfaceVersion.V3
    )
)

Performance and Dependencies

  • Improved loading times by dynamically loading only required components
  • Removed unused dependencies (further optimizations pending, particularly for WavTokenizer implementation)

Documentation

Full usage documentation is available at:
πŸ”— interface_usage.md

OuteTTS v0.3.2

17 Jan 15:20
Compare
Choose a tag to compare

Update 0.3

  • Implement v2 interface with simplified structure.
  • Split documentation for interface v1 and interface v2.
  • Add compatibility for OuteTTS-0.3 1B and 500M models.
  • Restructure codebase for better maintainability.

OuteTTS v0.2.3

14 Dec 10:58
Compare
Choose a tag to compare

Release Notes v0.2.3

  • Split WavTokenizer into encoder (82MB) and decoder (248MB) components
  • [WIP] Streaming support

OuteTTS v0.2.1

30 Nov 14:55
Compare
Choose a tag to compare

Release Notes v0.2.1

New Features and Improvements:

  1. Support for ExLlamaV2

    • Integrated support for ExLlamaV2
    • Pull request: #37
  2. Whisper Integration for Speaker Generation

    • Added Whisper-based transcription for generating speakers when no transcript is provided.
    • Suggested in: #28
    • Now, if transcript is set to None, the text will be automatically transcribed using Whisper.
    def create_speaker(
        self, 
        audio_path: str, 
        transcript: str = None, 
        whisper_model: str = "turbo",
        whisper_device = None
    )

OuteTTS v0.2.0 Release

25 Nov 12:08
Compare
Choose a tag to compare

OuteTTS v0.2.0 Release Notes

Major Changes

  • New Model Support: Added support for OuteTTS-0.2-500M model
  • Speaker Management: Introduced default speaker presets for each supported language
  • Breaking Changes:
    • Speaker files from previous versions (<0.2.0) are not compatible
    • Interface usage has been significantly revised (see README.md for new implementation)

New Features

  • Added voice cloning guidelines and interface usage recommendations in README.md
  • Implemented Gradio example playground for OuteTTS-0.2-500M
  • Multi-language alignment support
  • Enhanced speaker management:
    • New methods: interface.print_default_speakers() and interface.load_default_speaker(name="male_1")
    • Switched from pickle to JSON format for speaker saving
    • Added speaker language information in saved files
  • Option to load WavTokenizer from custom path (resolves issue #24)
  • Multiple interface version initialization in a single function

Improvements

  • Restructured library files for better organization
  • Implemented hash verification for WavTokenizer downloads (resolves issue #3)
  • Reworked interface for better usability
  • Made sounddevice optional with improved error handling for sound playback
  • Added data preparation examples for training

Error Handling

  • Added validation for audio token detection
  • Improved error messages for long input text and early EOS cases
  • Enhanced overall library error handling and feedback

How to Upgrade

  • Update your library via pip:
    pip install --upgrade outetts