Skip to content

Commit 86c8fe2

Browse files
committed
update reame for kokoro
1 parent 4081ba6 commit 86c8fe2

File tree

2 files changed

+21
-24
lines changed

2 files changed

+21
-24
lines changed

.env.sample

+1-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ KOKORO_TTS_VOICE=af_bella
4848

4949
# AUDIO GENERATION LENGTH
5050
# Maximum character length for audio generation - set to 2000+ for stories and games, 3000 for assassin story, 4000 for mars encounter interactive
51-
# MAX_CHAR_LENGTH is used for openai and elevenlabs, is also used for max tokens for chat response, if MAX_CHAR_LENGTH is 500, then 500 * 4 // 3 = 666 max tokens is sent to provider
51+
# MAX_CHAR_LENGTH is used for openai, elevenlabs and kokoro, is also used for max tokens for chat response, if MAX_CHAR_LENGTH is 500, then 500 * 4 // 3 = 666 max tokens is sent to provider
5252
MAX_CHAR_LENGTH=1000
5353
# XTTS Max Number of characters to generate audio, default is 255 but we are overriding that
5454
XTTS_NUM_CHARS=1000

README.md

+20-23
Original file line numberDiff line numberDiff line change
@@ -121,18 +121,18 @@ If you are only using speech with Openai or Elevenlabs then you don't need this.
121121
122122
[Kokoro TTS](https://github.com/remsky/Kokoro-FastAPI) is an open-source neural text-to-speech system based on the Kokoro-82M model, offering high-quality voice synthesis with various male and female voices.
123123
124-
Install it based on the instructions in the Kokoro repo.
124+
Install it based on the instructions in the Kokoro repo, like run it in docker, then you can connect to the api endpoints to use it's voices.
125125

126126
To use Kokoro TTS:
127127

128128
1. Configure Voice-Chat-AI to use Kokoro:
129-
- `KOKORO_BASE_URL=http://localhost:8880/v1` to your `.env` file
130-
- Set `TTS_PROVIDER=kokoro` in your `.env` file
131-
- Select a voice with `KOKORO_TTS_VOICE=af_bella` (female) or `KOKORO_TTS_VOICE=am_onyx` (male)
129+
- `KOKORO_BASE_URL=http://localhost:8880/v1` - set to your url
130+
- Set `TTS_PROVIDER=kokoro` - use it as the TTS_PROVIDER in .env or select in UI.
131+
- Select a voice with `KOKORO_TTS_VOICE=af_bella` (female) or `KOKORO_TTS_VOICE=am_onyx` (male) - defaults to use in .env, all voices will show in UI.
132132

133133
2. Start the Voice Chat AI application normally
134134

135-
Kokoro TTS operates locally on your machine, requiring no API key or internet connection once installed. The server supports GPU acceleration for faster processing if you have compatible NVIDIA hardware.
135+
Kokoro TTS operates locally on your machine or local network, requiring no API key or internet connection once installed. The server supports GPU acceleration for faster processing if you have compatible NVIDIA hardware.
136136

137137
## Usage
138138

@@ -352,8 +352,8 @@ docker run -d --gpus all -e "PULSE_SERVER=/mnt/wslg/PulseServer" -v \\wsl$\Ubunt
352352

353353
```env
354354
# Conditional API Usage:
355+
# Depending on the value of MODEL_PROVIDER, the corresponding service will be used when run.
355356
# You can mix and match; use local Ollama with OpenAI speech or use OpenAI model with local XTTS, etc.
356-
# If not using certain providers just leave defaults as is and don't select it in the UI.
357357
358358
# Model Provider: openai or ollama or xai or anthropic
359359
MODEL_PROVIDER=openai
@@ -366,7 +366,7 @@ MODEL_PROVIDER=openai
366366
CHARACTER_NAME=bigfoot
367367
368368
# Text-to-Speech (TTS) Configuration:
369-
# TTS Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice) or elevenlabs or kokoro
369+
# TTS Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice) or elevenlabs or kokoro (your own selfhosted tts)
370370
TTS_PROVIDER=openai
371371
372372
# Voice Speed for all TTS providers - 0.7 to 1.2, default is 1.0
@@ -395,12 +395,13 @@ ELEVENLABS_TTS_VOICE=your_voice_id_here
395395
ELEVENLABS_TTS_MODEL=eleven_multilingual_v2
396396
397397
# Kokoro TTS Configuration:
398-
# Default voice for Kokoro TTS - examples: af_bella, af_nova, am_onyx, etc.
398+
# bm_fable, bm_daniel, bm_lewis, af_alloy, af_bella
399+
# See the kokoro web url ( if you have it installed ) for more voices http://localhost:8880/web/
399400
KOKORO_TTS_VOICE=af_bella
400401
401402
# AUDIO GENERATION LENGTH
402403
# Maximum character length for audio generation - set to 2000+ for stories and games, 3000 for assassin story, 4000 for mars encounter interactive
403-
# MAX_CHAR_LENGTH is used for openai and elevenlabs, is also used for max tokens for chat response, if MAX_CHAR_LENGTH is 500, then 500 * 4 // 3 = 666 max tokens is sent to provider
404+
# MAX_CHAR_LENGTH is used for openai, elevenlabs and kokoro, is also used for max tokens for chat response, if MAX_CHAR_LENGTH is 500, then 500 * 4 // 3 = 666 max tokens is sent to provider
404405
MAX_CHAR_LENGTH=1000
405406
# XTTS Max Number of characters to generate audio, default is 255 but we are overriding that
406407
XTTS_NUM_CHARS=1000
@@ -449,25 +450,21 @@ KOKORO_BASE_URL=http://localhost:8880/v1
449450
DEBUG=false
450451
# Set to true to see audio level readings during recording
451452
DEBUG_AUDIO_LEVELS=false
452-
453-
# NOTES:
454-
# List of trigger phrases to have the model view your desktop (desktop, browser, images, etc.).
455-
# It will describe what it sees, and you can ask questions about it:
456-
# "what's on my screen", "take a screenshot", "show me my screen", "analyze my screen",
457-
# "what do you see on my screen", "screen capture", "screenshot"
458-
# To stop the conversation, say "Quit" or "Exit". ( ctl+c always works also)
459453
```
460454

461455
### Audio Commands
462456

463457
- You have 3 secs to talk, if there is silence then it's the AI's turn to talk
464-
- Say any of the following to have the AI look at your screen - "what's on my screen",
465-
"take a screenshot",
466-
"show me my screen",
467-
"analyze my screen",
468-
"what do you see on my screen",
469-
"screen capture",
470-
"screenshot" to have the AI explain what it is seeing in detail.
458+
- Say any of the following to have the AI look at your screen ( uses llava for ollama and openai as fall back )
459+
460+
"what's on my screen",
461+
"take a screenshot",
462+
"show me my screen",
463+
"analyze my screen",
464+
"what do you see on my screen",
465+
"screen capture",
466+
"screenshot" to have the AI explain what it is seeing in detail.
467+
471468
- To stop the conversation, say "Quit" or "Exit". ( ctl+c always works also in terminal )
472469

473470
### ElevenLabs

0 commit comments

Comments
 (0)