🎙️ Multimodal RAG – PDF to Podcast

Podcastify you PDFs incorporating question from listeners.

🧠 Overview

This generative AI pipeline converts PDFs—encompassing text, tables, and images—into natural-sounding podcast episodes. Leveraging a Multimodal RAG pipeline, it extracts and summarizes content, generates a two-host dialogue, and synthesizes audio for an immersive listening experience. It also uses RAG to incorporate questions from listeners (simulating) at the end of the podcast.

🚀 Features

Multimodal Content Extraction: Processes text, tables, and images using the Unstructured module with a title-based strategy.
Semantic Search: Employs Chroma DB and OpenAI embeddings for vectorization and retrieval of contextually relevant information.
Dialogue Generation: Utilizes GPT-4o Mini to create summaries and generate a two-host podcast script, addressing potential listener questions.
Text-to-Speech Synthesis: Integrates Google's NotebookLM-based TTS module, podcastfy, to produce realistic speech.

🧰 Tech Stack

LLMs: GPT-4o
Prompt Framework: LangChain
Embeddings & Vector Store: OpenAI Embeddings, Chroma DB
Content Extraction: Unstructured module
Text-to-Speech: podcastfy (Google's NotebookLM-based TTS)

⚙️ Installation

Clone the Repository:

git clone https://github.com/yourusername/Multimodal-RAG-PDF-to-Podcast.git
cd multimodal-rag-pdf-to-podcast

Create a Virtual Environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Set Up Environment Variables:

Create a .env file in the root directory and add your API keys:
```
OPENAI_API_KEY=your_openai_api_key
```

📄 Sample PDF

You can test the application using the sample PDFs provided in the data/pdfs/ directory.

🎙️ Generated Podcast Example

A example of podcast audio generated using this pipeline is provided in the data/audio/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
.gitignore		.gitignore
README.md		README.md
multimodal_rag.ipynb		multimodal_rag.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Multimodal RAG – PDF to Podcast

🧠 Overview

🚀 Features

🧰 Tech Stack

⚙️ Installation

📄 Sample PDF

🎙️ Generated Podcast Example

About

Uh oh!

Releases

Packages

Uh oh!

Languages

siddh30/Multimodal-RAG-PDF-to-Podcast

Folders and files

Latest commit

History

Repository files navigation

🎙️ Multimodal RAG – PDF to Podcast

🧠 Overview

🚀 Features

🧰 Tech Stack

⚙️ Installation

📄 Sample PDF

🎙️ Generated Podcast Example

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages