Skip to content

Arabic Chat with PDF is a user-friendly application that lets you interact with Arabic PDF documents. Powered by advanced language models, OCR, and vector search, it allows you to upload PDFs, ask questions, and receive accurate Arabic responses πŸš€

License

Notifications You must be signed in to change notification settings

MohammedNasserAhmed/arabic-pdf-chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Arabic PDF Chat πŸ“šπŸ’¬

MIT License
Python 3.9+
Hugging Face Spaces

About πŸ“–

Arabic Chat with PDF is an innovative tool designed to enable users to interactively query Arabic PDF documents. Powered by state-of-the-art language models and document processing libraries, this application extracts, processes, and retrieves meaningful insights from Arabic text documents. Users can ask questions in Arabic, and the system responds in a professional tone, making this an essential tool for Arabic language researchers, educators, and professionals.


Live Demo πŸš€

Explore the hosted version of this project on Hugging Face Spaces:

πŸ‘‰ HF Space

Arabic Chat with PDF Screenshot

Upload an Arabic PDF and start chatting in seconds! πŸ’¬πŸ“š


Features ✨

  • Seamless PDF Integration: Upload Arabic PDFs under 10 MB, and start chatting.
  • Advanced Text Recognition: Utilizes OCR for Arabic text extraction from searchable PDFs.
  • Conversational Interface: Interact via a chatbot with RTL (Right-To-Left) support for natural Arabic conversation.
  • Multilingual Embeddings: Employs multilingual embeddings for precise text analysis.
  • Text-to-Speech: Outputs audio responses in Arabic for accessibility.
  • Customizable UI: Designed with Arabic aesthetics and user-friendly components.

Technologies Used πŸ› οΈ


Getting Started πŸš€

Prerequisites πŸ“‹

Ensure you have:

  • Python 3.9+
  • pip for package management
  • Access to API keys:
    • GROQ_API_KEY
    • HF_TOKEN

Installation βš™οΈ

  1. Clone the Repository:
    git clone https://github.com/your-repo/arabic-pdf-chat.git
    cd arabic-chat-with-pdf
  2. Install Dependencies:
    pip install -r requirements.txt
  3. Set Up Environment Variables:
    Create a .env file and add the required API keys:
    GROQ_API_KEY=your_groq_api_key
    HF_TOKEN=your_huggingface_token

Run the Application πŸ–₯️

Launch the app with:

python app.py

The Gradio interface will open in your browser.


ETL Process πŸ”„

The system follows a structured ETL pipeline:

  1. Extract: Reads Arabic PDFs using OCR (pytesseract) and PyPDF2.
  2. Transform:
    • Splits text into manageable chunks with CharacterTextSplitter.
    • Converts raw text into vector embeddings using sentence-transformers.
  3. Load: Stores transformed data in a FAISS vector database for efficient retrieval.

Limitations ⚠️

  • File Size: Limited to PDFs under 10 MB.
  • Language Support: Optimized only for Arabic text. Non-Arabic content is not supported.
  • Scanned Documents: OCR may struggle with low-quality scans.
  • Performance: Response times may vary depending on document size and complexity.

Author πŸ–‹οΈ

πŸ‘€ M. N. Gaber
πŸ”— GitHub Profile
πŸ”— LinkedIn


License πŸ“„

This project is licensed under the Apache License.


Acknowledgments πŸ™

Special thanks to the developers of the libraries and frameworks that made this project possible.


Contributing 🀝

Contributions are welcome! Please fork the repository and submit a pull request. For major changes, open an issue first to discuss what you would like to change.


Future Plans πŸ› οΈ

  • Add support for scanned and handwriting docs.
  • Improve OCR accuracy for complex layouts.
  • Enhance conversational capabilities with additional LLM models.

Enjoy exploring Arabic text in a whole new way! πŸŽ‰

About

Arabic Chat with PDF is a user-friendly application that lets you interact with Arabic PDF documents. Powered by advanced language models, OCR, and vector search, it allows you to upload PDFs, ask questions, and receive accurate Arabic responses πŸš€

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published