docker build without xtts and cuda

bigsk1 · bigsk1 · commit 920ab4c93e19 · 2025-02-18T14:47:10.000-08:00
diff --git a/.env.sample b/.env.sample
@@ -40,6 +40,8 @@ OPENAI_MODEL=gpt-4o-mini
 OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
 OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
 OLLAMA_BASE_URL=http://localhost:11434
+# IF RUNNING IN DOCKER CHANGE OLLAMA BASE URL TO THE ONE BELOW
+# OLLAMA_BASE_URL=http://host.docker.internal:11434
 
 # Models Configuration:
 # Model to use - llama3 or llama3.1 or 3.2 works well for local usage. In the UI you will have a list of popular models to choose from so the model here is just a starting point
diff --git a/README.md b/README.md
@@ -91,7 +91,7 @@ is in system PATH or whatever version you downloaded
 
 ### Optional - Download Checkpoints - ONLY IF YOU ARE USING THE LOCAL TTS
 
-If you are only using speech with Openai or Elevenlabs then you don't need this. To use the local TTS download the checkpoints for the models used in this project ( the docker image has the local xtts in it already ). You can download them from the GitHub releases page and extract the zip and put into the project folder.
+If you are only using speech with Openai or Elevenlabs then you don't need this. To use the local TTS download the checkpoints for the models used in this project ( the docker image has the local xtts and checkpoints in it already ). You can download them from the GitHub releases page and extract the zip and put into the project folder.
 
 - [Download Checkpoint](https://github.com/bigsk1/voice-chat-ai/releases/download/models/checkpoints.zip)
 - [Download XTTS-v2](https://github.com/bigsk1/voice-chat-ai/releases/download/models/XTTS-v2.zip)
@@ -134,7 +134,7 @@ python cli.py
 
 This is for running with an Nvidia GPU and you have Nvidia toolkit and cudnn installed. 
 
-This image is huge when built because of all the checkpoints, cuda base image, build tools and audio tools - So there is no need to download the checkpoints and XTTS as they are in the image. This is all setup to use XTTS, if your not using XTTS for speech it should still work but it is just a large docker image and will take awhile, if you don't want to deal with that then run the app natively and don't use docker.
+This image is huge when built because of all the checkpoints, cuda base image, build tools and audio tools - So there is no need to download the checkpoints and XTTS as they are in the image. This is all setup to use XTTS, if your not using XTTS for speech it should still work but it is just a large docker image and will take awhile, if you don't want to deal with that then run the app natively or build your own image without the xtts and checkpoints folders, if you are not using the local TTS.
 
 This guide will help you quickly set up and run the **Voice Chat AI** Docker container. Ensure you have Docker installed and that your `.env` file is placed in the same directory as the commands are run. If you get cuda errors make sure to install nvidia toolkit for docker and cudnn is installed in your path.
 
@@ -146,7 +146,7 @@ This guide will help you quickly set up and run the **Voice Chat AI** Docker con
 
 ---
 
-## 🖥️ Run on Windows using docker desktop
+## 🖥️ Run on Windows using docker desktop - prebuilt image
 On windows using docker desktop - run in Windows terminal:
 make sure .env is in same folder you are running this from
 ```bash
@@ -201,7 +201,7 @@ docker stop voice-chat-ai
 docker rm voice-chat-ai
 ```
 
-## Build it yourself: 
+## Build it yourself with cuda: 
 
 ```bash
 docker build -t voice-chat-ai .
@@ -218,6 +218,36 @@ Running from wsl
 docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
 ```
 
+## Docker build without local xtts and no cuda
+
+```bash
+docker build -t voice-chat-ai-no-xtts -f no-xtts-Dockerfile .
+```
+
+In Windows command prompt
+
+```bash
+docker run -d
+   -e "PULSE_SERVER=/mnt/wslg/PulseServer"
+   -v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/
+   --env-file .env
+   --name voice-chat-ai-no-xtts
+   -p 8000:8000
+   voice-chat-ai-no-xtts:latest
+```
+
+In WSL2 Ubuntu 
+
+```bash
+docker run -d \
+    -e "PULSE_SERVER=/mnt/wslg/PulseServer" \
+    -v /mnt/wslg/:/mnt/wslg/ \
+    --env-file .env \
+    --name voice-chat-ai-no-xtts \
+    -p 8000:8000 \
+    voice-chat-ai-no-xtts:latest
+```
+
 ## Configuration ⚙️
 
 1. Rename the .env.sample to `.env` in the root directory of the project and configure it with the necessary environment variables: - The app is controlled based on the variables you add.
@@ -265,6 +295,8 @@ OPENAI_MODEL=gpt-4o-mini
 OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
 OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
 OLLAMA_BASE_URL=http://localhost:11434
+# IF RUNNING IN DOCKER CHANGE OLLAMA BASE URL TO THE ONE BELOW
+# OLLAMA_BASE_URL=http://host.docker.internal:11434
 
 # Models Configuration:
 # Models to use - llama3.2 works well for local usage.
diff --git a/app/app.py b/app/app.py
@@ -57,9 +57,25 @@
 # Capitalize the first letter of the character name
 character_display_name = CHARACTER_NAME.capitalize()
 
-# Set up the faster-whisper model
+# Check for CUDA availability
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
+# Default model size (adjust as needed)
 model_size = "medium.en"
-whisper_model = WhisperModel(model_size, device="cuda", compute_type="float16")
+
+try:
+    print(f"Attempting to load Faster-Whisper on {device}...")
+    whisper_model = WhisperModel(model_size, device=device, compute_type="float16" if device == "cuda" else "int8")
+    print("Faster-Whisper initialized successfully.")
+except Exception as e:
+    print(f"Error initializing Faster-Whisper on {device}: {e}")
+    print("Falling back to CPU mode...")
+
+    # Force CPU fallback
+    device = "cpu"
+    model_size = "tiny.en"  # Use a smaller model for CPU performance
+    whisper_model = WhisperModel(model_size, device="cpu", compute_type="int8")
+    print("Faster-Whisper initialized on CPU successfully.")
 
 # Paths for character-specific files
 project_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -206,7 +222,7 @@ def sync_play_audio(file_path):
     pass
 
 # Model and device setup
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
+# device = 'cuda' if torch.cuda.is_available() else 'cpu'
 output_dir = os.path.join(project_dir, 'outputs')
 os.makedirs(output_dir, exist_ok=True)
 
diff --git a/docker/using-docker.txt b/docker/using-docker.txt
@@ -1,5 +1,23 @@
+To use full cuda for xtts and faster whisper 
 
 docker build -t voice-chat-ai:latest .
 
 
-wsl docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v /mnt/wslg/:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
+wsl docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v /mnt/wslg/:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
+
+
+---
+
+
+To use for no xtts and cpu on faster whisper ( only about 6gb image size )
+
+docker build -t voice-chat-ai-no-xtts -f no-xtts-Dockerfile .
+
+	
+docker run -d
+   -e "PULSE_SERVER=/mnt/wslg/PulseServer"
+   -v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/
+   --env-file .env
+   --name voice-chat-ai-no-xtts
+   -p 8000:8000
+   voice-chat-ai-no-xtts:latest
diff --git a/no-xtts-Dockerfile b/no-xtts-Dockerfile
@@ -0,0 +1,56 @@
+# Use a lighter base image (Python 3.10 slim version)
+FROM python:3.10-slim
+
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV PATH="/root/.local/bin:$PATH"
+
+# Create a working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    libsndfile1 \
+    portaudio19-dev \
+    wget \
+    curl \
+    pulseaudio \
+    libsdl2-dev \
+    dbus \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install ffmpeg
+RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
+
+# Configure dbus
+RUN dbus-uuidgen > /etc/machine-id
+
+# Configure ALSA to use PulseAudio
+RUN echo "pcm.!default pulse" > /root/.asoundrc && \
+    echo "ctl.!default pulse" >> /root/.asoundrc
+
+# Ensure the directory exists before writing to the file
+RUN mkdir -p /usr/share/alsa/alsa.conf.d && \
+    echo "defaults.pcm.card 0" >> /usr/share/alsa/alsa.conf.d/99-pulseaudio-defaults.conf && \
+    echo "defaults.ctl.card 0" >> /usr/share/alsa/alsa.conf.d/99-pulseaudio-defaults.conf
+
+# Copy only necessary files (EXCLUDING large checkpoint and XTTS-v2)
+COPY requirements_no_xtts.txt /app/requirements.txt
+COPY app /app/app
+COPY characters /app/characters
+COPY outputs /app/outputs
+COPY cli.py /app/cli.py
+COPY elevenlabs_voices.json /app/elevenlabs_voices.json
+COPY elevenlabs_voices.json.sample /app/elevenlabs_voices.json.sample
+COPY .env.sample /app/.env.sample
+COPY README.md /app/README.md
+
+# Install Python dependencies (without cache to reduce image size)
+RUN pip install --upgrade pip && pip install --no-cache-dir -r /app/requirements.txt
+
+# Expose the port that the app runs on
+EXPOSE 8000
+
+# Command to run the application
+CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
diff --git a/requirements_no_xtts.txt b/requirements_no_xtts.txt
@@ -0,0 +1,31 @@
+# CPU-optimized PyTorch and related libraries (no CUDA)
+torch==2.3.1+cpu
+torchaudio==2.3.1+cpu
+torchvision==0.18.1+cpu
+-f https://download.pytorch.org/whl/torch_stable.html
+
+PyAudio==0.2.14
+numpy==1.22.0
+faster-whisper==1.0.2
+soundfile==0.12.1    
+langid==1.1.6
+librosa==0.10.0
+scipy==1.11.4
+transformers==4.41.2
+pydantic==2.7.4
+pillow==10.3.0
+
+pydub==0.25.1
+openai==1.33.0
+textblob==0.18.0.post0
+python-dotenv==1.0.1
+Flask==3.0.3  
+requests==2.32.3 
+fastapi==0.111.0
+uvicorn==0.30.1
+elevenlabs==1.12.1
+aiohttp==3.10.11
+spacy==3.7.5
+spacy-legacy==3.0.12
+spacy-loggers==1.0.5
+TTS==0.22.0