feat: kokoro tts support #643

MagdalenaKotynia · 2025-06-25T15:35:25Z

Purpose

To support the usage of Kokoro-TTS model. Kokoro-TTS was selected based on its high-quality speech output, small size, and potential to run on edge devices (it is in ONNX format)

Proposed Changes

Developed a class implementing the TTSModel interface for the Kokoro-TTS model.
Updated docs with newly supported model.
Updated example with TTSAgent to be able to use the newly supported model

Testing

Source ros2
Install RAI following the instructions for developers from docs
Install s2s package: poetry install --with s2s

With TTSAgent

Run TTSAgent example: python examples/s2s/tts.py
In another terminal run the following script to send ros2hri message to ros2 topic:

from rai.communication.ros2.connectors import ROS2HRIConnector
from rai.communication.ros2.messages import ROS2HRIMessage
import rclpy
import time

rclpy.init()
my_hri_msg = ROS2HRIMessage(
    text="Hello, human! This is a test message. How are you?",
    message_author="ai",
)

hri_connector = ROS2HRIConnector()

hri_connector.send_message(
    message=my_hri_msg,
    target="/to_human"
)

try:
    print("Sending message... Press Ctrl+C to exit")
    time.sleep(10)
    
except KeyboardInterrupt:
    print("Shutting down...")
finally:
    hri_connector.shutdown()
    rclpy.shutdown()

After a while, you should hear speech output from TTSAgent.

With ROS2S2SAgent

Run the following script and converse with agent:

from rai_s2s.sound_device import SoundDeviceConfig
from rai.communication.ros2 import ROS2Context
from rai_s2s.s2s.agents.s2s_agent import SpeechToSpeechAgent
from rai_s2s.s2s.agents.ros2s2s_agent import ROS2S2SAgent
from rai.agents.langchain.react_agent import ReActAgent
from rai_s2s.asr.models import OpenAIWhisper, SileroVAD
from rai_s2s import KokoroTTS

from rai.agents import AgentRunner


@ROS2Context()
def main():
    speaker_config = SoundDeviceConfig(
        stream=True,
        is_output=True,
        # device_name="EPOS PC 8 USB: Audio (hw:1,0)",
        # device_name="Sennheiser USB headset: Audio (hw:1,0)",
        # device_name="Jabra Speak2 40 MS: USB Audio (hw:2,0)",
        device_name="default",
    )

    microphone_config = SoundDeviceConfig(
        stream=True,
        channels=1,
        device_name="default",
        consumer_sampling_rate=16000,
        dtype="int16",
        is_input=True,
    )

    # whisper = LocalWhisper("tiny", 16000)
    whisper = OpenAIWhisper("gpt-4o-mini-transcribe", 16000)
    vad = SileroVAD(16000, 0.5)
    
    tts = KokoroTTS()

    agent = ROS2S2SAgent(
        from_human_topic="/from_human",
        to_human_topic="/to_human",
        microphone_config=microphone_config,
        speaker_config=speaker_config,
        transcription_model=whisper,
        vad=vad,
        tts=tts,
    )
    from rai.communication.ros2 import ROS2HRIConnector

    hri_connector = ROS2HRIConnector()
    llm = ReActAgent(
        target_connectors={"/to_human": hri_connector},
    )
    llm.subscribe_source("/from_human", hri_connector)
    runner = AgentRunner([agent, llm])
    runner.run_and_wait_for_shutdown()


if __name__ == "__main__":
    main()

The KokoroTTS model works well together with the ROS2S2SAgent.
My UX - It sounds nicer compared with OpenTTS. I didn't observe any significant differences in inference time between the models.
~~The model sometimes does not put space between the sentences.~~ EDIT: It was fixed by setting trim to false in create method of Kokoro.

…aded

…oid yanked version, regenerated poetry lock

…Agent

…n TTSAgent

…re read alound as words by TTS model

…wo separate chunks

maciejmajek · 2025-06-30T07:47:27Z

pyproject.toml

+# To avoid yanked version 3.0.6
+zarr = "!=3.0.6"
+


Does zarr with the 3.0.6 break rai?

maciejmajek · 2025-06-30T07:53:59Z

docs/speech_to_speech/sounddevice.md

+> [!WARNING]
+> It is not recommended to use device_name set to `'default'` in `SoundDeviceConfig` due to potential issues with audio.
+


Suggested change

> [!WARNING]

> It is not recommended to use device_name set to `'default'` in `SoundDeviceConfig` due to potential issues with audio.

> [!TIP]

> If you're experiencing audio issues and device_name is set to 'default', try specifying the exact device name instead, as this often resolves the problem.

Also, please add this note to the configurator.

maciejmajek · 2025-07-03T15:26:39Z

src/rai_s2s/rai_s2s/tts/models/kokoro_tts.py

+    def _preprocess_text(self, text: str) -> str:
+        """
+        Preprocesses text by removing formatting characters that would be
+        read aloud as words (like 'asterisk' for '*').
+
+        Parameters
+        ----------
+        text : str
+            The input text that may contain formatting characters.
+
+        Returns
+        -------
+        str
+            The cleaned text with formatting characters removed.
+        """
+        # Remove markdown headers (# symbols at start of line)
+        text = re.sub(r"^#+\s*", "", text)
+
+        # Remove bold markdown (** or __)
+        text = re.sub(r"\*\*(.*?)\*\*", r"\1", text)
+        text = re.sub(r"__(.*?)__", r"\1", text)
+
+        # Remove italic markdown (* or _)
+        text = re.sub(r"\*(.*?)\*", r"\1", text)
+        text = re.sub(r"_(.*?)_", r"\1", text)
+
+        return text


Is kokoro pronouncing markdown elements?
Why does this method remove only a certain subset of markdown symbols?

maciejmajek · 2025-07-03T15:29:20Z

src/rai_s2s/rai_s2s/tts/models/kokoro_tts.py

+            )
+
+            if samples.dtype == np.float32:
+                samples = (samples * 32768).clip(-32768, 32767).astype(np.int16)


Are we expecting values outside of the provided range?
Clipping audio should only be used as a last resort, as it introduces massive quality degradation.

maciejmajek · 2025-07-03T15:31:16Z

src/rai_s2s/README.md

+For available voices and languages supported within currently used version of the model - use `get_available_voices()` and `get_supported_languages()` methods of the `rai_s2s.tts.models.KokoroTTS` respectively.
+
+> [!NOTE]
+> You may encounter phonemizer warnings like "words count mismatch on x% of the lines". These warnings do not indicate that something is wrong with text to speech processing and can be safely ignored.


Can we configure kokoro's logger to drop these warnings?

MagdalenaKotynia added 16 commits June 25, 2025 14:06

feat: implementation of TTSModel interface for kokoro-tts model

019eb5b

refactor: moved initialization of Kokoro model to init of the KokoroTTS

3a9a440

feat: add methods to get supported languages and voices of Kokoro model

77f2976

feat: add automated model and voices download if it is not yet downlo…

9b4b949

…aded

feat: added support for KokoroTTS in TTSAgent

60543d9

build: updated poetry.lock

cbed888

build: updated s2s pyproject toml, added zarr to pyproject toml to av…

d32b1c9

…oid yanked version, regenerated poetry lock

docs: added Kokoro TTS description to README

9521348

feat: added KokoroTTS for configurator

d4bd13a

chore: add KokoroTTS to init and agent initialization as default model

68b59f6

fix: added resampling for KokoroTTS model to properly use it with TTS…

e50c706

…Agent

chore: add KokoroTTS import to rai_s2s init

91cc47b

docs: update docs with vendors

9478038

chore: set trim to False to minimize the number of output underflow i…

cea506d

…n TTSAgent

feat: add KokoroTTS to example with TTSAgent

95747f6

test: added tts test for KokoroTTS in configurator

ad7ee39

MagdalenaKotynia marked this pull request as ready for review June 26, 2025 13:35

fix: apply text preprocessing to remove formatting characters that we…

9d232eb

…re read alound as words by TTS model

MagdalenaKotynia requested review from boczekbartek and removed request for boczekbartek June 26, 2025 17:45

MagdalenaKotynia added 7 commits June 27, 2025 13:45

fix: handled ''#' signs produces by llm to not to spell them by TTS

b46bc99

fix: set trim to False to preserve pause between the sentences from t…

248a8f7

…wo separate chunks

feat: support for quantized models

1623df1

chore: minor tidying up the code

8edcecf

docs: added info about models sizes

5ff9098

docs: added warning about using default device name in SoundDeviceConfig

2104543

docs: added quotes

a5091db

MagdalenaKotynia requested a review from boczekbartek June 27, 2025 12:29

chore: removed unnecesary multiline flag from text preprocessing

c562fda

maciejmajek reviewed Jun 30, 2025

View reviewed changes

pyproject.toml

Comment on lines +24 to +26

# To avoid yanked version 3.0.6

zarr = "!=3.0.6"

Copy link

Member

maciejmajek Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does zarr with the 3.0.6 break rai?

maciejmajek reviewed Jun 30, 2025

View reviewed changes

maciejmajek reviewed Jul 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: kokoro tts support #643

feat: kokoro tts support #643

Uh oh!

MagdalenaKotynia commented Jun 25, 2025 •

edited

Loading

Uh oh!

maciejmajek Jun 30, 2025

Uh oh!

maciejmajek Jun 30, 2025

Uh oh!

maciejmajek Jul 3, 2025

Uh oh!

maciejmajek Jul 3, 2025

Uh oh!

maciejmajek Jul 3, 2025

Uh oh!

maciejmajek Jul 3, 2025

Uh oh!

Uh oh!

		> [!WARNING]
		> It is not recommended to use device_name set to `'default'` in `SoundDeviceConfig` due to potential issues with audio.

feat: kokoro tts support #643

Are you sure you want to change the base?

feat: kokoro tts support #643

Uh oh!

Conversation

MagdalenaKotynia commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Proposed Changes

Testing

With TTSAgent

With ROS2S2SAgent

Uh oh!

maciejmajek Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

maciejmajek Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

maciejmajek Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

maciejmajek Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

maciejmajek Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

maciejmajek Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MagdalenaKotynia commented Jun 25, 2025 •

edited

Loading