Document Summarizer

A powerful Python application for intelligent document analysis and summarization using state-of-the-art language models. Features include smart document chunking, iterative summarization, and an intuitive web interface.

System Architecture

graph TB
    subgraph "Frontend Layer"
        A[Streamlit UI]
    end
    
    subgraph "Application Layer"
        D[Document Chunker]
        E[Summarization Engine]
    end
    
    subgraph "Service Layer"
        G[Ollama Service]
        H[Token Counter]
        I[Logger]
    end
    
    A --> D
    A --> E
    E --> G
    D --> H
    E --> H
    D --> I
    E --> I

Features

1. Document Processing

Smart document chunking with configurable parameters
Token-based text splitting for optimal LLM processing
Context preservation through sliding window approach
Real-time token and character statistics

2. Summarization

Multiple summarization strategies based on text length
Support for various Ollama models
Configurable output parameters
Progress tracking and error handling

3. Web Interface

Intuitive Streamlit-based UI
Real-time processing feedback
Configuration management
Summary history tracking

Installation

Prerequisites

Python 3.11 or higher
Ollama with at least one model installed
UV package manager (recommended)

Setup

Clone the repository:

git clone https://github.com/palash-jain-cw/DocumentSummariser.git
cd DocumentSummariser

Install dependencies (Use UV for dependency management):

uv pip install -e .

Usage

1. Start the Application

streamlit run src/documentsummariser/app/Home.py

2. Using the API

from documentsummariser.summarisation.summarizer import Summarizer
from documentsummariser.summarisation.document_chunker import DocumentChunker

# Initialize components
chunker = DocumentChunker(chunk_size=256, overlap_size=30)
summarizer = Summarizer(model_name="llama3.2:3b", word_limit=250)

# Process a document
chunks = chunker.chunk_document(long_text)
summary = summarizer.summarize_text(text)

Module Structure

graph LR
    A[documentsummariser] --> B[app]
    A --> C[summarisation]
    A --> D[logger]
    
    B --> E[Home.py]
    B --> F[Document_Chunker.py]
    B --> G[Summarizer.py]
    
    C --> H[summarizer.py]
    C --> I[document_chunker.py]
    
    D --> J[logger_setup.py]

Configuration

1. Environment Variables

OLLAMA_HOST=http://localhost:11434
LOG_LEVEL=INFO

2. Application Settings

# Default configuration
config = {
    "chunk_size": 256,
    "overlap_size": 30,
    "model_name": "llama3.2:3b",
    "word_limit": 250
}

API Documentation

1. Document Chunker

class DocumentChunker:
    """Handles document splitting with context preservation."""
    
    def chunk_document(text: str) -> List[str]:
        """Split document into chunks."""
        
    def get_chunk_info(text: str) -> Dict:
        """Get chunking statistics."""

2. Summarizer

class Summarizer:
    """Manages document summarization process."""
    
    def summarize_text(text: str) -> str:
        """Generate summary for text."""
        
    def process_records(texts: List[str]) -> List[str]:
        """Process multiple documents."""

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support, please:

Check the documentation
Search existing issues
Create a new issue if needed

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src/documentsummariser		src/documentsummariser
.cursorrules		.cursorrules
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Summarizer

System Architecture

Features

1. Document Processing

2. Summarization

3. Web Interface

Installation

Prerequisites

Setup

Usage

1. Start the Application

2. Using the API

Module Structure

Configuration

1. Environment Variables

2. Application Settings

API Documentation

1. Document Chunker

2. Summarizer

Contributing

License

Support

About

Releases

Packages

Languages

palash-jain-cw/DocumentSummariser

Folders and files

Latest commit

History

Repository files navigation

Document Summarizer

System Architecture

Features

1. Document Processing

2. Summarization

3. Web Interface

Installation

Prerequisites

Setup

Usage

1. Start the Application

2. Using the API

Module Structure

Configuration

1. Environment Variables

2. Application Settings

API Documentation

1. Document Chunker

2. Summarizer

Contributing

License

Support

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages