Multimodal AI Chat App with Dynamic Routing
VT.ai is a multimodal AI chat application designed to simplify interaction with different AI models through a unified interface. It employs vector-based semantic routing to direct queries to the most suitable model, eliminating the need to switch between multiple applications and interfaces.
- Multi-Provider Integration: Unified access to models from OpenAI (o1/o3/4o), Anthropic (Claude), Google (Gemini), DeepSeek, Llama, Cohere, and local models via Ollama
- Semantic Routing System: Vector-based classification automatically routes queries to appropriate models using FastEmbed embeddings, removing the need for manual model selection
- Multimodal Capabilities: Comprehensive support for text, image, and audio inputs with advanced vision analysis
- Image Generation: GPT-Image-1 integration with support for transparent backgrounds, multiple formats, and customizable quality parameters
- Web Search Integration: Real-time information retrieval with source attribution via Tavily API
- Voice Processing: Advanced speech-to-text and text-to-speech functionality with configurable voice options and silence detection
- Reasoning Visualization: Step-by-step model reasoning visualization with the
<think>
tag for transparent AI decision processes
Multiple installation methods are available depending on requirements:
# Standard PyPI installation
uv pip install vtai
# Zero-installation experience with uvx
export OPENAI_API_KEY='your-key-here'
uvx vtai
# Development installation
git clone https://github.com/vinhnx/VT.ai.git
cd VT.ai
uv venv
source .venv/bin/activate # Linux/Mac
uv pip install -e ".[dev]" # Install with development dependencies
Configure API keys to enable specific model capabilities:
# Command-line configuration
vtai --api-key openai=sk-your-key-here
# Environment variable configuration
export OPENAI_API_KEY='sk-your-key-here' # For OpenAI models
export ANTHROPIC_API_KEY='sk-ant-your-key-here' # For Claude models
export GEMINI_API_KEY='your-key-here' # For Gemini models
API keys are securely stored in ~/.config/vtai/.env
for future use.
The application provides a clean, intuitive interface with the following capabilities:
- Dynamic Conversations: The semantic router automatically selects the most appropriate model for each query
- Image Generation: Create images using prompts like "generate an image of..." or "draw a..."
- Visual Analysis: Upload or provide URLs to analyze visual content
- Reasoning Visualization: Add
<think>
to prompts to observe step-by-step reasoning - Voice Interaction: Use the microphone feature for speech input and text-to-speech output
Detailed usage instructions are available in the Getting Started Guide.
The documentation is organized into sections designed for different user needs:
- User Guide: Installation, configuration, and feature documentation
- Developer Guide: Architecture details, extension points, and implementation information
- API Reference: Comprehensive API documentation for programmatic usage
VT.ai offers two distinct implementations:
- Python Implementation: Full-featured reference implementation with complete support for all capabilities
- Rust Implementation: High-performance alternative with optimized memory usage and native compiled speed
The implementation documentation provides a detailed comparison of both options.
Category | Models |
---|---|
Chat | GPT-o1, GPT-o3 Mini, GPT-4o, Claude 3.5/3.7, Gemini 2.0/2.5 |
Vision | GPT-4o, Gemini 1.5 Pro/Flash, Claude 3, Llama3.2 Vision |
Image Gen | GPT-Image-1 with custom parameters |
TTS | GPT-4o mini TTS, TTS-1, TTS-1-HD |
Local | Llama3, Mistral, DeepSeek R1 (1.5B to 70B via Ollama) |
The Models Documentation provides detailed information about model-specific capabilities and configuration options.
VT.ai leverages several open-source projects to deliver its functionality:
- Chainlit: Modern chat interface framework
- LiteLLM: Unified model abstraction layer
- SemanticRouter: Intent classification system
- FastEmbed: Efficient embedding generation
- Tavily: Web search capabilities
The application architecture follows a clean, modular design:
- Entry Point:
vtai/app.py
- Main application logic - Routing Layer:
vtai/router/
- Semantic classification system - Assistants:
vtai/assistants/
- Specialized handlers for different query types - Tools:
vtai/tools/
- Web search, file operations, and other integrations
Contributions to VT.ai are welcome. The project accepts various types of contributions:
- Bug Reports: Submit detailed GitHub issues for any bugs encountered
- Feature Requests: Propose new functionality through GitHub issues
- Pull Requests: Submit code improvements and bug fixes
- Documentation: Enhance documentation or add examples
- Feedback: Share user experiences to help improve the project
Development setup:
# Clone the repository
git clone https://github.com/vinhnx/VT.ai.git
cd VT.ai
# Set up development environment
uv venv
source .venv/bin/activate # Linux/Mac
uv pip install -e ".[dev]"
chainlit run vtai/app
# Run tests
pytest
Quality is maintained through comprehensive testing:
# Run the test suite
pytest
# Run with coverage reporting
pytest --cov=vtai
# Run specific test categories
pytest tests/unit/
pytest tests/integration/
VT.ai is available under the MIT License - See LICENSE for details.