A scalable enterprise-grade document intelligence platform that combines the power of Large Language Models (LLMs) with AWS infrastructure to provide semantic search capabilities through Retrieval-Augmented Generation (RAG).
- PDF-to-Knowledge Conversion Pipeline: Transform documents into queryable knowledge
- Context-Aware Q&A: Get accurate answers with LLM synthesis
- Zero-Downtime AWS Deployment: Enterprise-grade reliability
- Secure Access Control: Role-based authentication
- Vector-based Search: High-performance semantic matching
The system operates in two main stages:
- PDF Upload: Users submit documents through admin interface
- Document Chunking: Split PDFs into semantic text segments
- Vector Encoding: Convert chunks to embeddings using Amazon Titan model
- Vector Storage: Index embeddings in FAISS with metadata pointers
- Query Input: Receive natural language question from user
- Query Encoding: Convert question to vector using Tim model
- Context Retrieval: Find top-K matching chunks via FAISS similarity search
- Answer Synthesis: Augment LLM with context to generate final response
- Frontend: Streamlit
- Backend: Python 3.11
- Vector Database: FAISS
- Cloud Platform: AWS (EC2, S3, VPC, ALB)
- Embeddings: Amazon Titan models
- LLM Integration: Support for GPT and Claude
- AWS Account with IAM permissions
- Docker installed
- Python 3.11+
# Clone the repository
git clone https://github.com/Wiran-Larbi/serverless-rag.git
cd serverless-rag
# Set up virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run services locally
streamlit run admin.py
streamlit run client.py
# Build the Docker images
docker build -t rag-admin:latest -f admin.Dockerfile .
docker build -t rag-client:latest -f client.Dockerfile .
# Run the containers
docker run -d \
--name rag-admin \
-p 8083:8083 \
--restart unless-stopped \
rag-admin:latest
docker run -d \
--name rag-client \
-p 8084:8084 \
--restart unless-stopped \
rag-client:latest
The system uses the following security measures:
- HTTPS endpoints
- IAM Role authentication for admin access
- API Key/Bearer authentication for client access
- VPC isolation for internal services
VPC:
PublicSubnets:
- CIDR: 10.0.1.0/24
AZ: us-east-1a
PrivateSubnets:
- CIDR: 10.0.2.0/24
AZ: us-east-1b
- CIDR: 10.0.3.0/24
AZ: us-east-1c
SecurityGroups:
ALB-SG:
Ingress:
- Protocol: TCP
Ports: [80, 443]
EC2-SG:
Ingress:
- Protocol: TCP
Ports: [8083-8084]
Source: ALB-SG
Access the admin interface to upload and manage documents:
https://api.example.com/admin
Access the client interface to query documents:
https://api.example.com/client
The system exposes REST APIs for programmatic access.
Q: How is FAISS storage synchronized between instances?
A: We use an S3-backed synchronization layer that:
- Maintains a primary FAISS index in us-east-1
- Replicates to read-only replicas in other regions
- Uses versioned S3 objects for consistency
Important: Ensure proper VPC peering configuration when accessing from other AWS services!
Q: What document formats are supported?
A: Currently we support PDF documents, with plans to add DOCX, TXT, and HTML in future releases.
Q: Can I customize the embedding models?
A: Yes, the system is designed to work with custom embedding models. See the configuration guide.
The system is designed to handle:
- Up to 10,000 pages of documents
- Response times < 2 seconds for most queries
- Concurrent users: up to 50 simultaneous queries
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Wiran Larbi - @WiranLarbi - [email protected]
Website: https://www.wiranlarbi.site
Project Link: https://github.com/Wiran-Larbi/serverless-rag