Skip to content

vinhnx/VT.ai

Repository files navigation

VT

VT.ai screenshot

Multimodal AI Chat App with Dynamic Routing

Github Sponsors

PyPI Python Hugging Face Open in GitHub Codespaces Documentation License: MIT

Google Gemini Claude ChatGPT Deepseek

VT.ai

VT.ai is a multimodal AI chat application designed to simplify interaction with different AI models through a unified interface. It employs vector-based semantic routing to direct queries to the most suitable model, eliminating the need to switch between multiple applications and interfaces.

Documentation

Key Features

  • Multi-Provider Integration: Unified access to models from OpenAI (o1/o3/4o), Anthropic (Claude), Google (Gemini), DeepSeek, Llama, Cohere, and local models via Ollama
  • Semantic Routing System: Vector-based classification automatically routes queries to appropriate models using FastEmbed embeddings, removing the need for manual model selection
  • Multimodal Capabilities: Comprehensive support for text, image, and audio inputs with advanced vision analysis
  • Image Generation: GPT-Image-1 integration with support for transparent backgrounds, multiple formats, and customizable quality parameters
  • Web Search Integration: Real-time information retrieval with source attribution via Tavily API
  • Voice Processing: Advanced speech-to-text and text-to-speech functionality with configurable voice options and silence detection
  • Reasoning Visualization: Step-by-step model reasoning visualization with the <think> tag for transparent AI decision processes

Installation & Setup

Multiple installation methods are available depending on requirements:

# Standard PyPI installation
uv pip install vtai

# Zero-installation experience with uvx
export OPENAI_API_KEY='your-key-here'
uvx vtai

# Development installation
git clone https://github.com/vinhnx/VT.ai.git
cd VT.ai
uv venv
source .venv/bin/activate  # Linux/Mac
uv pip install -e ".[dev]"  # Install with development dependencies

API Key Configuration

Configure API keys to enable specific model capabilities:

# Command-line configuration
vtai --api-key openai=sk-your-key-here

# Environment variable configuration
export OPENAI_API_KEY='sk-your-key-here'  # For OpenAI models
export ANTHROPIC_API_KEY='sk-ant-your-key-here'  # For Claude models
export GEMINI_API_KEY='your-key-here'  # For Gemini models

API keys are securely stored in ~/.config/vtai/.env for future use.

Usage Guide

Interface Usage

The application provides a clean, intuitive interface with the following capabilities:

  1. Dynamic Conversations: The semantic router automatically selects the most appropriate model for each query
  2. Image Generation: Create images using prompts like "generate an image of..." or "draw a..."
  3. Visual Analysis: Upload or provide URLs to analyze visual content
  4. Reasoning Visualization: Add <think> to prompts to observe step-by-step reasoning
  5. Voice Interaction: Use the microphone feature for speech input and text-to-speech output

Detailed usage instructions are available in the Getting Started Guide.

Documentation

The documentation is organized into sections designed for different user needs:

  • User Guide: Installation, configuration, and feature documentation
  • Developer Guide: Architecture details, extension points, and implementation information
  • API Reference: Comprehensive API documentation for programmatic usage

Implementation Options

VT.ai offers two distinct implementations:

  • Python Implementation: Full-featured reference implementation with complete support for all capabilities
  • Rust Implementation: High-performance alternative with optimized memory usage and native compiled speed

The implementation documentation provides a detailed comparison of both options.

Supported Models

Category Models
Chat GPT-o1, GPT-o3 Mini, GPT-4o, Claude 3.5/3.7, Gemini 2.0/2.5
Vision GPT-4o, Gemini 1.5 Pro/Flash, Claude 3, Llama3.2 Vision
Image Gen GPT-Image-1 with custom parameters
TTS GPT-4o mini TTS, TTS-1, TTS-1-HD
Local Llama3, Mistral, DeepSeek R1 (1.5B to 70B via Ollama)

The Models Documentation provides detailed information about model-specific capabilities and configuration options.

Technical Architecture

VT.ai leverages several open-source projects to deliver its functionality:

The application architecture follows a clean, modular design:

  • Entry Point: vtai/app.py - Main application logic
  • Routing Layer: vtai/router/ - Semantic classification system
  • Assistants: vtai/assistants/ - Specialized handlers for different query types
  • Tools: vtai/tools/ - Web search, file operations, and other integrations

Contributing

Contributions to VT.ai are welcome. The project accepts various types of contributions:

  • Bug Reports: Submit detailed GitHub issues for any bugs encountered
  • Feature Requests: Propose new functionality through GitHub issues
  • Pull Requests: Submit code improvements and bug fixes
  • Documentation: Enhance documentation or add examples
  • Feedback: Share user experiences to help improve the project

Development setup:

# Clone the repository
git clone https://github.com/vinhnx/VT.ai.git
cd VT.ai

# Set up development environment
uv venv
source .venv/bin/activate  # Linux/Mac
uv pip install -e ".[dev]"

chainlit run vtai/app

# Run tests
pytest

Testing and Quality

Quality is maintained through comprehensive testing:

# Run the test suite
pytest

# Run with coverage reporting
pytest --cov=vtai

# Run specific test categories
pytest tests/unit/
pytest tests/integration/

License

VT.ai is available under the MIT License - See LICENSE for details.

Sponsor this project

  •  

Contributors 2

  •  
  •  

Languages