Arabic PDF Chat 📚💬

About 📖

Arabic Chat with PDF is an innovative tool designed to enable users to interactively query Arabic PDF documents. Powered by state-of-the-art language models and document processing libraries, this application extracts, processes, and retrieves meaningful insights from Arabic text documents. Users can ask questions in Arabic, and the system responds in a professional tone, making this an essential tool for Arabic language researchers, educators, and professionals.

Live Demo 🚀

Explore the hosted version of this project on Hugging Face Spaces:

👉

Upload an Arabic PDF and start chatting in seconds! 💬📚

Features ✨

Seamless PDF Integration: Upload Arabic PDFs under 10 MB, and start chatting.
Advanced Text Recognition: Utilizes OCR for Arabic text extraction from searchable PDFs.
Conversational Interface: Interact via a chatbot with RTL (Right-To-Left) support for natural Arabic conversation.
Multilingual Embeddings: Employs multilingual embeddings for precise text analysis.
Text-to-Speech: Outputs audio responses in Arabic for accessibility.
Customizable UI: Designed with Arabic aesthetics and user-friendly components.

Technologies Used 🛠️

Python Libraries:
- Gradio: User-friendly UI for interaction.
- PyPDF2: PDF text extraction.
- pytesseract: OCR for PDFs.
- LangChain: Framework for conversational AI with retrieval-based querying.
- gTTS: Arabic text-to-speech functionality.
Machine Learning Models:
- LLMs: Powered by ChatGroq using the gemma2-9b-it model.
- Embeddings: Utilizes sentence-transformers/paraphrase-multilingual-mpnet-base-v2.
Vector Store: FAISS for efficient similarity search and retrieval.

Getting Started 🚀

Prerequisites 📋

Ensure you have:

Python 3.9+
pip for package management
Access to API keys:
- GROQ_API_KEY
- HF_TOKEN

Installation ⚙️

Clone the Repository:

git clone https://github.com/your-repo/arabic-pdf-chat.git
cd arabic-chat-with-pdf

Install Dependencies:
```
pip install -r requirements.txt
```
Set Up Environment Variables:
Create a .env file and add the required API keys:
```
GROQ_API_KEY=your_groq_api_key
HF_TOKEN=your_huggingface_token
```

Run the Application 🖥️

Launch the app with:

python app.py

The Gradio interface will open in your browser.

ETL Process 🔄

The system follows a structured ETL pipeline:

Extract: Reads Arabic PDFs using OCR (pytesseract) and PyPDF2.
Transform:
- Splits text into manageable chunks with CharacterTextSplitter.
- Converts raw text into vector embeddings using sentence-transformers.
Load: Stores transformed data in a FAISS vector database for efficient retrieval.

Limitations ⚠️

File Size: Limited to PDFs under 10 MB.
Language Support: Optimized only for Arabic text. Non-Arabic content is not supported.
Scanned Documents: OCR may struggle with low-quality scans.
Performance: Response times may vary depending on document size and complexity.

Author 🖋️

👤 M. N. Gaber
🔗 GitHub Profile
🔗 LinkedIn

License 📄

This project is licensed under the Apache License.

Acknowledgments 🙏

Special thanks to the developers of the libraries and frameworks that made this project possible.

Contributing 🤝

Contributions are welcome! Please fork the repository and submit a pull request. For major changes, open an issue first to discuss what you would like to change.

Future Plans 🛠️

Add support for scanned and handwriting docs.
Improve OCR accuracy for complex layouts.
Enhance conversational capabilities with additional LLM models.

Enjoy exploring Arabic text in a whole new way! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md
app.py		app.py
logo.png		logo.png
packages.txt		packages.txt
requirements.txt		requirements.txt
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Arabic PDF Chat 📚💬

About 📖

Live Demo 🚀

Features ✨

Technologies Used 🛠️

Getting Started 🚀

Prerequisites 📋

Installation ⚙️

Run the Application 🖥️

ETL Process 🔄

Limitations ⚠️

Author 🖋️

License 📄

Acknowledgments 🙏

Contributing 🤝

Future Plans 🛠️

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

MohammedNasserAhmed/arabic-pdf-chat

Folders and files

Latest commit

History

Repository files navigation

Arabic PDF Chat 📚💬

About 📖

Live Demo 🚀

Features ✨

Technologies Used 🛠️

Getting Started 🚀

Prerequisites 📋

Installation ⚙️

Run the Application 🖥️

ETL Process 🔄

Limitations ⚠️

Author 🖋️

License 📄

Acknowledgments 🙏

Contributing 🤝

Future Plans 🛠️

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages