You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How feasible would it be to mirror the OpenAI Realtime API with a bot running a pipeline including all components from stt, llm, tts (or all in one realtime agent)? The components would be configurable and support multiple local and external providers and therefore have different configuration options. In other words, instead of communicating with a client in Pipecat Frames, these would be converted to OpenAI-like events (e.g. not OutputAudioRawFrame, but a response.audio.delta event or not BotStoppedSpeakingFrame, but repsonse.audio.done). Note: I'm not talking about using the OpenAI-realtime API (I know this is possible with pipecat) but of mirroring its events and messages for my bot and its communication with a client.
What I assume would need to be done:
a custom serializer converting pipecat frames to OpenAI realtime events (currently ProtobufFrameSerializer)
a custom transport that is able to handle all the possible incoming and outgoing events (probably extend/modify FastAPIWebsocketTransport -> should still support WebSocket with FastAPI)
i could probably use a modification of this transport for the client
My concerns/questions regrading this:
would I still need pipecat frames for internal server pipeline communication or can they be made obsolete by a custom transport?
is there a negative impact on performance from the overhead of converting frames to events?
is it a bit brute-force to map OpenAI-realtime events to pipecat frames (I'm thinking that not all service combinations - e.g. local-tts, azure-llm, openai-stt - will allow for a useful translation of OpenAI-realtime events to pipecat frames)?
is session creation/updating possible similar to how OpenAI-realtime does it?
would this be compatible with pipecat's RTVI framework?
Maybe someone has already been looking/investigating in this direction and can give me some insights. Any help/information is much appreciated!
The text was updated successfully, but these errors were encountered:
How feasible would it be to mirror the OpenAI Realtime API with a bot running a pipeline including all components from stt, llm, tts (or all in one realtime agent)? The components would be configurable and support multiple local and external providers and therefore have different configuration options. In other words, instead of communicating with a client in Pipecat Frames, these would be converted to OpenAI-like events (e.g. not
OutputAudioRawFrame
, but aresponse.audio.delta
event or notBotStoppedSpeakingFrame
, butrepsonse.audio.done
). Note: I'm not talking about using the OpenAI-realtime API (I know this is possible with pipecat) but of mirroring its events and messages for my bot and its communication with a client.What I assume would need to be done:
ProtobufFrameSerializer
)FastAPIWebsocketTransport
-> should still support WebSocket with FastAPI)My concerns/questions regrading this:
RTVI
framework?Maybe someone has already been looking/investigating in this direction and can give me some insights. Any help/information is much appreciated!
The text was updated successfully, but these errors were encountered: