Skip to content

Question: Feasibility of mirroring (not using) OpenAI Realtime API for pipecat bot #1462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nikp06 opened this issue Mar 27, 2025 · 0 comments

Comments

@nikp06
Copy link

nikp06 commented Mar 27, 2025

How feasible would it be to mirror the OpenAI Realtime API with a bot running a pipeline including all components from stt, llm, tts (or all in one realtime agent)? The components would be configurable and support multiple local and external providers and therefore have different configuration options. In other words, instead of communicating with a client in Pipecat Frames, these would be converted to OpenAI-like events (e.g. not OutputAudioRawFrame, but a response.audio.delta event or not BotStoppedSpeakingFrame, but repsonse.audio.done). Note: I'm not talking about using the OpenAI-realtime API (I know this is possible with pipecat) but of mirroring its events and messages for my bot and its communication with a client.

What I assume would need to be done:

  • a custom serializer converting pipecat frames to OpenAI realtime events (currently ProtobufFrameSerializer)
  • a custom transport that is able to handle all the possible incoming and outgoing events (probably extend/modify FastAPIWebsocketTransport -> should still support WebSocket with FastAPI)
  • i could probably use a modification of this transport for the client

My concerns/questions regrading this:

  • would I still need pipecat frames for internal server pipeline communication or can they be made obsolete by a custom transport?
  • is there a negative impact on performance from the overhead of converting frames to events?
  • is it a bit brute-force to map OpenAI-realtime events to pipecat frames (I'm thinking that not all service combinations - e.g. local-tts, azure-llm, openai-stt - will allow for a useful translation of OpenAI-realtime events to pipecat frames)?
  • is session creation/updating possible similar to how OpenAI-realtime does it?
  • would this be compatible with pipecat's RTVI framework?

Maybe someone has already been looking/investigating in this direction and can give me some insights. Any help/information is much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant