Skip to content

Commit 920ab4c

Browse files
committed
docker build without xtts and cuda
1 parent 422b56c commit 920ab4c

File tree

6 files changed

+163
-8
lines changed

6 files changed

+163
-8
lines changed

.env.sample

+2
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ OPENAI_MODEL=gpt-4o-mini
4040
OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
4141
OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
4242
OLLAMA_BASE_URL=http://localhost:11434
43+
# IF RUNNING IN DOCKER CHANGE OLLAMA BASE URL TO THE ONE BELOW
44+
# OLLAMA_BASE_URL=http://host.docker.internal:11434
4345

4446
# Models Configuration:
4547
# Model to use - llama3 or llama3.1 or 3.2 works well for local usage. In the UI you will have a list of popular models to choose from so the model here is just a starting point

README.md

+36-4
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ is in system PATH or whatever version you downloaded
9191

9292
### Optional - Download Checkpoints - ONLY IF YOU ARE USING THE LOCAL TTS
9393

94-
If you are only using speech with Openai or Elevenlabs then you don't need this. To use the local TTS download the checkpoints for the models used in this project ( the docker image has the local xtts in it already ). You can download them from the GitHub releases page and extract the zip and put into the project folder.
94+
If you are only using speech with Openai or Elevenlabs then you don't need this. To use the local TTS download the checkpoints for the models used in this project ( the docker image has the local xtts and checkpoints in it already ). You can download them from the GitHub releases page and extract the zip and put into the project folder.
9595

9696
- [Download Checkpoint](https://github.com/bigsk1/voice-chat-ai/releases/download/models/checkpoints.zip)
9797
- [Download XTTS-v2](https://github.com/bigsk1/voice-chat-ai/releases/download/models/XTTS-v2.zip)
@@ -134,7 +134,7 @@ python cli.py
134134

135135
This is for running with an Nvidia GPU and you have Nvidia toolkit and cudnn installed.
136136

137-
This image is huge when built because of all the checkpoints, cuda base image, build tools and audio tools - So there is no need to download the checkpoints and XTTS as they are in the image. This is all setup to use XTTS, if your not using XTTS for speech it should still work but it is just a large docker image and will take awhile, if you don't want to deal with that then run the app natively and don't use docker.
137+
This image is huge when built because of all the checkpoints, cuda base image, build tools and audio tools - So there is no need to download the checkpoints and XTTS as they are in the image. This is all setup to use XTTS, if your not using XTTS for speech it should still work but it is just a large docker image and will take awhile, if you don't want to deal with that then run the app natively or build your own image without the xtts and checkpoints folders, if you are not using the local TTS.
138138

139139
This guide will help you quickly set up and run the **Voice Chat AI** Docker container. Ensure you have Docker installed and that your `.env` file is placed in the same directory as the commands are run. If you get cuda errors make sure to install nvidia toolkit for docker and cudnn is installed in your path.
140140

@@ -146,7 +146,7 @@ This guide will help you quickly set up and run the **Voice Chat AI** Docker con
146146

147147
---
148148

149-
## 🖥️ Run on Windows using docker desktop
149+
## 🖥️ Run on Windows using docker desktop - prebuilt image
150150
On windows using docker desktop - run in Windows terminal:
151151
make sure .env is in same folder you are running this from
152152
```bash
@@ -201,7 +201,7 @@ docker stop voice-chat-ai
201201
docker rm voice-chat-ai
202202
```
203203

204-
## Build it yourself:
204+
## Build it yourself with cuda:
205205

206206
```bash
207207
docker build -t voice-chat-ai .
@@ -218,6 +218,36 @@ Running from wsl
218218
docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
219219
```
220220

221+
## Docker build without local xtts and no cuda
222+
223+
```bash
224+
docker build -t voice-chat-ai-no-xtts -f no-xtts-Dockerfile .
225+
```
226+
227+
In Windows command prompt
228+
229+
```bash
230+
docker run -d
231+
-e "PULSE_SERVER=/mnt/wslg/PulseServer"
232+
-v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/
233+
--env-file .env
234+
--name voice-chat-ai-no-xtts
235+
-p 8000:8000
236+
voice-chat-ai-no-xtts:latest
237+
```
238+
239+
In WSL2 Ubuntu
240+
241+
```bash
242+
docker run -d \
243+
-e "PULSE_SERVER=/mnt/wslg/PulseServer" \
244+
-v /mnt/wslg/:/mnt/wslg/ \
245+
--env-file .env \
246+
--name voice-chat-ai-no-xtts \
247+
-p 8000:8000 \
248+
voice-chat-ai-no-xtts:latest
249+
```
250+
221251
## Configuration ⚙️
222252

223253
1. Rename the .env.sample to `.env` in the root directory of the project and configure it with the necessary environment variables: - The app is controlled based on the variables you add.
@@ -265,6 +295,8 @@ OPENAI_MODEL=gpt-4o-mini
265295
OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
266296
OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
267297
OLLAMA_BASE_URL=http://localhost:11434
298+
# IF RUNNING IN DOCKER CHANGE OLLAMA BASE URL TO THE ONE BELOW
299+
# OLLAMA_BASE_URL=http://host.docker.internal:11434
268300
269301
# Models Configuration:
270302
# Models to use - llama3.2 works well for local usage.

app/app.py

+19-3
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,25 @@
5757
# Capitalize the first letter of the character name
5858
character_display_name = CHARACTER_NAME.capitalize()
5959

60-
# Set up the faster-whisper model
60+
# Check for CUDA availability
61+
device = "cuda" if torch.cuda.is_available() else "cpu"
62+
63+
# Default model size (adjust as needed)
6164
model_size = "medium.en"
62-
whisper_model = WhisperModel(model_size, device="cuda", compute_type="float16")
65+
66+
try:
67+
print(f"Attempting to load Faster-Whisper on {device}...")
68+
whisper_model = WhisperModel(model_size, device=device, compute_type="float16" if device == "cuda" else "int8")
69+
print("Faster-Whisper initialized successfully.")
70+
except Exception as e:
71+
print(f"Error initializing Faster-Whisper on {device}: {e}")
72+
print("Falling back to CPU mode...")
73+
74+
# Force CPU fallback
75+
device = "cpu"
76+
model_size = "tiny.en" # Use a smaller model for CPU performance
77+
whisper_model = WhisperModel(model_size, device="cpu", compute_type="int8")
78+
print("Faster-Whisper initialized on CPU successfully.")
6379

6480
# Paths for character-specific files
6581
project_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -206,7 +222,7 @@ def sync_play_audio(file_path):
206222
pass
207223

208224
# Model and device setup
209-
device = 'cuda' if torch.cuda.is_available() else 'cpu'
225+
# device = 'cuda' if torch.cuda.is_available() else 'cpu'
210226
output_dir = os.path.join(project_dir, 'outputs')
211227
os.makedirs(output_dir, exist_ok=True)
212228

docker/using-docker.txt

+19-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
1+
To use full cuda for xtts and faster whisper
12

23
docker build -t voice-chat-ai:latest .
34

45

5-
wsl docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v /mnt/wslg/:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
6+
wsl docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v /mnt/wslg/:/mnt/wslg/ --env-file .env --name voice-chat-ai -p 8000:8000 voice-chat-ai:latest
7+
8+
9+
---
10+
11+
12+
To use for no xtts and cpu on faster whisper ( only about 6gb image size )
13+
14+
docker build -t voice-chat-ai-no-xtts -f no-xtts-Dockerfile .
15+
16+
17+
docker run -d
18+
-e "PULSE_SERVER=/mnt/wslg/PulseServer"
19+
-v \\wsl$\Ubuntu\mnt\wslg:/mnt/wslg/
20+
--env-file .env
21+
--name voice-chat-ai-no-xtts
22+
-p 8000:8000
23+
voice-chat-ai-no-xtts:latest

no-xtts-Dockerfile

+56
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Use a lighter base image (Python 3.10 slim version)
2+
FROM python:3.10-slim
3+
4+
# Set environment variables
5+
ENV PYTHONUNBUFFERED=1
6+
ENV PATH="/root/.local/bin:$PATH"
7+
8+
# Create a working directory
9+
WORKDIR /app
10+
11+
# Install system dependencies
12+
RUN apt-get update && apt-get install -y \
13+
build-essential \
14+
libsndfile1 \
15+
portaudio19-dev \
16+
wget \
17+
curl \
18+
pulseaudio \
19+
libsdl2-dev \
20+
dbus \
21+
&& rm -rf /var/lib/apt/lists/*
22+
23+
# Install ffmpeg
24+
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
25+
26+
# Configure dbus
27+
RUN dbus-uuidgen > /etc/machine-id
28+
29+
# Configure ALSA to use PulseAudio
30+
RUN echo "pcm.!default pulse" > /root/.asoundrc && \
31+
echo "ctl.!default pulse" >> /root/.asoundrc
32+
33+
# Ensure the directory exists before writing to the file
34+
RUN mkdir -p /usr/share/alsa/alsa.conf.d && \
35+
echo "defaults.pcm.card 0" >> /usr/share/alsa/alsa.conf.d/99-pulseaudio-defaults.conf && \
36+
echo "defaults.ctl.card 0" >> /usr/share/alsa/alsa.conf.d/99-pulseaudio-defaults.conf
37+
38+
# Copy only necessary files (EXCLUDING large checkpoint and XTTS-v2)
39+
COPY requirements_no_xtts.txt /app/requirements.txt
40+
COPY app /app/app
41+
COPY characters /app/characters
42+
COPY outputs /app/outputs
43+
COPY cli.py /app/cli.py
44+
COPY elevenlabs_voices.json /app/elevenlabs_voices.json
45+
COPY elevenlabs_voices.json.sample /app/elevenlabs_voices.json.sample
46+
COPY .env.sample /app/.env.sample
47+
COPY README.md /app/README.md
48+
49+
# Install Python dependencies (without cache to reduce image size)
50+
RUN pip install --upgrade pip && pip install --no-cache-dir -r /app/requirements.txt
51+
52+
# Expose the port that the app runs on
53+
EXPOSE 8000
54+
55+
# Command to run the application
56+
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

requirements_no_xtts.txt

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# CPU-optimized PyTorch and related libraries (no CUDA)
2+
torch==2.3.1+cpu
3+
torchaudio==2.3.1+cpu
4+
torchvision==0.18.1+cpu
5+
-f https://download.pytorch.org/whl/torch_stable.html
6+
7+
PyAudio==0.2.14
8+
numpy==1.22.0
9+
faster-whisper==1.0.2
10+
soundfile==0.12.1
11+
langid==1.1.6
12+
librosa==0.10.0
13+
scipy==1.11.4
14+
transformers==4.41.2
15+
pydantic==2.7.4
16+
pillow==10.3.0
17+
18+
pydub==0.25.1
19+
openai==1.33.0
20+
textblob==0.18.0.post0
21+
python-dotenv==1.0.1
22+
Flask==3.0.3
23+
requests==2.32.3
24+
fastapi==0.111.0
25+
uvicorn==0.30.1
26+
elevenlabs==1.12.1
27+
aiohttp==3.10.11
28+
spacy==3.7.5
29+
spacy-legacy==3.0.12
30+
spacy-loggers==1.0.5
31+
TTS==0.22.0

0 commit comments

Comments
 (0)