Multimodal Live Agent

This pattern showcases a real-time conversational RAG agent powered by Google Gemini. The agent handles audio, video, and text interactions while leveraging tool calling with a vector DB for grounded responses.

Key components:

Python Backend (in app/ folder): A production-ready server built with FastAPI and google-genai that features:
- Real-time bidirectional communication via WebSockets between the frontend and Gemini model
- Integrated tool calling with vector database support for contextual document retrieval
- Production-grade reliability with retry logic and automatic reconnection capabilities
- Deployment flexibility supporting both AI Studio and Vertex AI endpoints
- Feedback logging endpoint for collecting user interactions
React Frontend (in frontend/ folder): Extends the Multimodal live API Web Console, with added features like custom URLs and feedback collection.

Once both the backend and frontend are running, click the play button in the frontend UI to establish a connection with the backend. You can now interact with the Multimodal Live Agent! You can try asking questions such as "Using the tool you have, define Governance in the context MLOPs" to allow the agent to use the documentation it was provided to.

Additional Resources for Multimodal Live API

Explore these resources to learn more about the Multimodal Live API and see examples of its usage:

Project Pastra: a comprehensive developer guide for the Gemini Multimodal Live API.
Google Cloud Multimodal Live API demos and samples: Collection of code samples and demo applications leveraging multimodal live API in Vertex AI
Gemini 2 Cookbook: Practical examples and tutorials for working with Gemini 2
Multimodal Live API Web Console: Interactive React-based web interface for testing and experimenting with Gemini Multimodal Live API.

Current Status & Future Work

This pattern is under active development. Key areas planned for future enhancement include:

Observability: Implementing comprehensive monitoring and tracing features.
Load Testing: Integrating load testing capabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multimodal Live Agent

Additional Resources for Multimodal Live API

Current Status & Future Work

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multimodal Live Agent

Additional Resources for Multimodal Live API

Current Status & Future Work