LitOrganizer is a powerful tool designed for researchers, academics, and students to organize their PDF literature collections automatically. It extracts metadata from academic papers, renames files according to citation standards, categorizes them into a logical directory structure, and provides powerful search capabilities.
- Smart Metadata Extraction: Automatically extracts DOIs and retrieves complete metadata from multiple academic APIs
- Citation-based Renaming: Renames PDF files using APA7 format (Author_Year) for easy identification
- Intelligent Categorization: Organizes PDFs into folders by journal, author, year, or subject
- Reference List Generation: Creates a comprehensive bibliography of all processed papers
- Full-text Search: Quickly find information across your entire PDF collection
- Context Display: View search results with surrounding text for better understanding
- Flexible Search Options: Use exact match, case sensitivity, or regular expressions
- Export Results: Save search results to Word and Excel files with highlighted matches
- Performance Metrics: Visual representation of processing speed and efficiency
- Accuracy Analysis: Detailed breakdown of metadata quality and DOI detection rates
- Publication Analytics: Distribution of papers by author, journal, year, and subject
- Error Diagnostics: Identification of problematic files with detailed error analysis
- Modern Design: Clean, intuitive interface with Windows 11 design principles
- Multi-tab Layout: Separate tabs for organization, search, and statistics
- Progress Tracking: Real-time progress indicators and detailed logging
- Customizable Options: Flexible settings to adapt to your workflow
- Python 3.8 or later
- Required Python packages (see
requirements.txt
) - For OCR functionality: Tesseract OCR
-
Clone or download this repository:
git clone https://github.com/bcankara/LitOrganizer.git cd LitOrganizer
-
Install required dependencies:
pip install -r requirements.txt
-
(Optional) For OCR functionality, install Tesseract OCR:
- Windows: Download and install from Tesseract at UB Mannheim
- macOS:
brew install tesseract
- Linux:
sudo apt install tesseract-ocr
Run the application without arguments to start in GUI mode:
python litorganizer.py
- Select a directory containing PDFs using the "Browse" button
- Configure categorization options (by journal, author, year, subject)
- Click "Start Processing" to begin organizing your files
- Monitor progress in the log window
- Select a directory containing PDFs
- Enter a keyword to search for
- Configure search options:
- Exact Match: Only match complete words
- Case Sensitive: Match exact letter case
- Use Regex: Use regular expressions for pattern matching
- Click "Start Search" to begin
- View results and save to Word/Excel if desired
- General Statistics: Overall performance metrics and accuracy analysis
- Publication Statistics: Detailed breakdown by author, journal, year, and subject
Basic usage:
python litorganizer.py -d /path/to/pdfs
Additional options:
python litorganizer.py --help
API settings for DOI metadata retrieval can be configured in the API Settings tab or by editing config/api_config.json
.
LitOrganizer Workflow
- Start with unorganized PDFs
- Extract DOIs and metadata
- Rename and categorize files
- Generate references and statistics
- Input: Start with a folder of unorganized PDF files
- Processing: LitOrganizer extracts DOIs and retrieves metadata
- Organization: Files are renamed and categorized
- Output: A well-structured directory with properly named files
LitOrganizer is built with:
- PyQt5: For the graphical user interface
- PyMuPDF & pdfplumber: For PDF text extraction
- Requests: For API communication with academic databases
- pandas & python-docx: For exporting search results
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with PyQt5 for the user interface
- Uses pdfplumber and PyMuPDF for PDF text extraction
- Integrated with multiple academic APIs for metadata retrieval
For questions, suggestions, or issues, please open an issue on GitHub or contact the maintainer.