OpenAI Realtime Workshop

This is an example application showing how to use the OpenAI Realtime API with WebRTC. You can view this application at different stages of completeness in the git branches shown below.

Using this application

This tutorial repository is designed to allow you to check out different branches of the code at different starting points, and work on implementing features as directed.

The main branch of this application contains a final working version of the application, which you can try out first to get a sense of how the end product is supposed to work (and to test your OpenAI API credentials). This application is a lightly modified version of the Realtime console application, rebuilt as a frontend web design assistant.

You will be asked to complete a series of four tutorial exercises in this repository that introduce important concepts of working with the Realtime API. Each tutorial's beginning and end states are checked into git branches. You can move to any point along the way by using git checkout tutorial_2_start etc.

The final iteration of this application can be found on the main branch with git checkout main.

Step 0: Kicking the tires

To get things started, begin by just configuring and launching the application so you can play with it in the browser.

Before you do that however, you'll need an OpenAI API key - create one in the dashboard here. Create a .env file from the .env.example file in this repository, and set your API key in there:

cp .env.example .env

Running this application locally requires Node.js to be installed. Install dependencies for the application with:

npm install

Start the application server with:

npm run dev

This should start the console application on http://localhost:3000.

This application is a minimal template that uses express to serve the React frontend contained in the /client folder. The server is configured to use vite to build the React frontend. It uses Tailwind CSS for styling.

🎯 Objective: Configure a working development environment for the Realtime API.
🏎️ Starting branch: git checkout main

Step 1: Prompting and configuration

The base application contains a very basic voice prompt for the application, but we can do better. The initial session configuration happens when an ephemeral token is fetched from the server. Update this configuration with a more detailed voice prompt (see the example below under Example voice prompt), and try a different voice from the supported list. It is often desirable to have the model start the conversation after a connection is established - modify to code to send a mostly empty response.create client message to kick off this process.

🎯 Objective:
- Initialize a Realtime session with a more expressive voice prompt
- Use a voice other than the default one currently used.
- Have the model start the conversation by speaking a greeting aloud.
🏎️ Starting branch: git checkout tutorial_1_start
🏁 Solution branch: git checkout tutorial_1_solution
Diff the starting code and solution

Hints:

The request to configure the Realtime session is found in server.js
The chunk of code required to have the model start talking first is in RealtimeContext.jsx in the startSession function.

Step 2: Function calling

One of the most important techniques to master when building LLM apps (with Realtime or no) is extending the capabilities of the model with function calling. At this phase, you can attempt building and configuring a function that will be called when the user asks a question about using a color palette in their web designs.

🎯 Objective: Configure a function that will be called whenever the user asks for a suggestion about a color palette. Display their requested color palette and theme in the UI.
🏎️ Starting branch: git checkout tutorial_2_start
🏁 Solution branch: git checkout tutorial_2_solution
Diff the starting code and solution

Hints:

The function call code lives in ToolPanel.jsx
You will need to define both a function description (which the model will use to decide when to call your function), and the parameters that the function accepts using JSON Schema.

Step 3: Guardrails

When dealing with model responses, you will often need to implement guardrails to ensure that what the model is saying is accurate and in keeping with your intended tone and behavior. At this step, we will listen for audio transcription events, and cut off the model if it starts to talk about subjects that we don't want it to talk about.

🎯 Objective: On the client, listen for realtime audio transcription events, and moderate the content generated from the model. If the model starts to talk about the TypeScript programming language (or another one you choose!), cut off the model's response.
🏎️ Starting branch: git checkout tutorial_3_start
🏁 Solution branch: git checkout tutorial_3_solution
Diff the starting code and solution

Hints:

The guardrails code lives in RealtimeContext.jsx in the responseGuardrails function.
responseGuardrails will be called continuously as the model is streaming its transcription response, with the input being a steadily growing string of what the model is speaking aloud.
This implementation is a little tricky since it uses new events just for WebRTC, but basically:
- If during the audio output you detect content you don't want (like the word "TypeScript" in our case) in the transcription, you can elect to cut off the model before saying any more.
- To do this, you must sent two client events in order (neither require any additional data in their payloads except the event type):
  - response.cancel
  - output_audio_buffer.clear

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
client		client
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
server.js		server.js
tailwind.config.js		tailwind.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenAI Realtime Workshop

Using this application

Step 0: Kicking the tires

Step 1: Prompting and configuration

Step 2: Function calling

Step 3: Guardrails

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kwhinnery-openai/aidev-conf

Folders and files

Latest commit

History

Repository files navigation

OpenAI Realtime Workshop

Using this application

Step 0: Kicking the tires

Step 1: Prompting and configuration

Step 2: Function calling

Step 3: Guardrails

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages