Releases: edwko/OuteTTS
Releases Β· edwko/OuteTTS
OuteTTS Lib v0.4
OuteTTS Lib v0.4 Release Notes
Interface Improvements
- Consolidated all interface versions into a single
interface.py
file for centralized management - Implemented isolated model handling in separate version folders while maintaining core functionality for cross-compatibility
- Added Interface Version 3 implementation to support OuteTTS v1.0 models
New Features
- Smart text chunking for generating long audio clips from large text inputs
- Added DAC interface code to handle OuteTTS 1.0 audio encoding and decoding
- Added metadata for interface version compatibility in speaker files
Transformers Backend Patch for OuteTTS 1.0
- Implemented windowed repetition penalty processor (
RepetitionPenaltyLogitsProcessorPatch
) for improved text generation quality - Applies penalties only to recent tokens (64-token window) rather than full context
- Addresses key quality issues in speech synthesis applications
- Maintains backward compatibility with standard HuggingFace interfaces
Streamlined Usage
Simplified code usage with a more modular and compact implementation:
output = interface.generate(
config=outetts.GenerationConfig(
text="Hello, how are you doing?",
generation_type=outetts.GenerationType.CHUNKED,
speaker=speaker,
sampler_config=outetts.SamplerConfig(
temperature=0.4
# Additional sampler parameters
),
)
)
Automatic Configuration
Added support for automatic config and model loading for v1.0 models:
# Auto-configuration approach
interface = outetts.Interface(
config=outetts.ModelConfig.auto_config(
model=outetts.Models.VERSION_1_0_SIZE_1B,
backend=outetts.Backend.LLAMACPP,
quantization=outetts.LlamaCppQuantization.FP16
)
)
Manual configuration remains available:
# Manual configuration approach
interface = outetts.Interface(
config=outetts.ModelConfig(
model_path="...",
tokenizer_path="...",
backend=outetts.Backend.LLAMACPP,
interface_version=outetts.InterfaceVersion.V3
)
)
Performance and Dependencies
- Improved loading times by dynamically loading only required components
- Removed unused dependencies (further optimizations pending, particularly for WavTokenizer implementation)
Documentation
Full usage documentation is available at:
π interface_usage.md
OuteTTS v0.3.2
Update 0.3
- Implement v2 interface with simplified structure.
- Split documentation for interface v1 and interface v2.
- Add compatibility for OuteTTS-0.3 1B and 500M models.
- Restructure codebase for better maintainability.
OuteTTS v0.2.3
Release Notes v0.2.3
- Split WavTokenizer into encoder (82MB) and decoder (248MB) components
- [WIP] Streaming support
OuteTTS v0.2.1
Release Notes v0.2.1
New Features and Improvements:
-
Support for ExLlamaV2
- Integrated support for ExLlamaV2
- Pull request: #37
-
Whisper Integration for Speaker Generation
- Added Whisper-based transcription for generating speakers when no transcript is provided.
- Suggested in: #28
- Now, if
transcript
is set toNone
, the text will be automatically transcribed using Whisper.
def create_speaker( self, audio_path: str, transcript: str = None, whisper_model: str = "turbo", whisper_device = None )
OuteTTS v0.2.0 Release
OuteTTS v0.2.0 Release Notes
Major Changes
- New Model Support: Added support for OuteTTS-0.2-500M model
- Speaker Management: Introduced default speaker presets for each supported language
- Breaking Changes:
- Speaker files from previous versions (<0.2.0) are not compatible
- Interface usage has been significantly revised (see README.md for new implementation)
New Features
- Added voice cloning guidelines and interface usage recommendations in README.md
- Implemented Gradio example playground for OuteTTS-0.2-500M
- Multi-language alignment support
- Enhanced speaker management:
- New methods:
interface.print_default_speakers()
andinterface.load_default_speaker(name="male_1")
- Switched from pickle to JSON format for speaker saving
- Added speaker language information in saved files
- New methods:
- Option to load WavTokenizer from custom path (resolves issue #24)
- Multiple interface version initialization in a single function
Improvements
- Restructured library files for better organization
- Implemented hash verification for WavTokenizer downloads (resolves issue #3)
- Reworked interface for better usability
- Made sounddevice optional with improved error handling for sound playback
- Added data preparation examples for training
Error Handling
- Added validation for audio token detection
- Improved error messages for long input text and early EOS cases
- Enhanced overall library error handling and feedback
How to Upgrade
- Update your library via pip:
pip install --upgrade outetts