Skip to content

Refactor rai_tts and rai_asr into rai_s2s #559

Closed
@maciejmajek

Description

@maciejmajek

With the current design, rai_tts and rai_asr share a Sounddevice connector, which is implemented in rai_core. rai_core should not depend on sounddevice (or any other SDConnector dependency) by default (rai_core), therefore merging the two packages is beneficial in terms of dependency separation.

The tts and asr packages were separated, due to various model dependencies. To minimize the risk of huge package size, models should be optional. This means, that the user has to specify which model during installation e.g.:

pip install rai_s2s[faster_whisper, kokoro_tts]

Refactors:

  • refactor rai_asr and rai_tts into one package rai_s2s sharing
  • rai_s2s should install sounddevice by default
  • move sounddevice connector from rai_core to rai_s2s

Agent implementation:

  • ASRAgent
  • TTSAgent
  • S2SAgent using bidirectional sounddevice stream, compatible with ReActAgent

Docs

  • README.md
  • S2SAgent docs (including compatibility info with ReActAgent and limitations (if it's bound to ros2))
  • ASRAgent docs
  • TTSAgent docs

Misc:

from abc import abstractmethod

from rai.communication import HRIMessage
from rai.communication.ros2 import ROS2HRIConnector, ROS2HRIMessage

class S2SAgent:
    def __init__(**audio_kwargs):
        pass

    def tts_callback(self, message: HRIMessage):
        # process input

    @abstractmethod
    def send(self, message: HRIMessage):
        # method implemented by subclass with concrete connectors
        pass


class ROS2S2SAgent(S2SAgent):
    def __init__(self, from_human: str, to_human: str, **audio_kwargs):
        super().__init__(**audio_kwargs)
        self.in_topic = from_human
        self.out_topic = to_human
        self.connector = ROS2HRIConnector()
        self.connector.register_callback(
            callback=self.tts_callback, source=self.in_topic
        )

    def send(self, message: HRIMessage):
        msg = ROS2HRIMessage(
            text=message.text,
            images=message.images,
            audios=message.audios,
            communication_id=message.communication_id,
            seq_no=message.seq_no,
            seq_end=message.seq_end,
        )
        self.connector.send_message(target=self.out_topic, message=msg)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestpriority/criticalHigh urgency. This task is someone's current main focus and should be resolved as soon as possible.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions