Skip to content

Issue 332 | ruff refactoring and requirements.txt update #343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 19, 2025

Conversation

rahulsingh237
Copy link

@rahulsingh237 rahulsingh237 commented Apr 19, 2025

Pull Request Title

Related Issue

#332

Description

Type

  • Bug Fix
  • Feature Enhancement
  • Documentation Update
  • Code Refactoring
  • Other (please specify):

Proposed Changes

Screenshots / Code Snippets (if applicable)

How to Test

Checklist

  • The code compiles successfully without any errors or warnings
  • The changes have been tested and verified
  • The documentation has been updated (if applicable)
  • The changes follow the project's coding guidelines and best practices
  • The commit messages are descriptive and follow the project's guidelines
  • All tests (if applicable) pass successfully
  • This pull request has been linked to the related issue (if applicable)

Additional Information

Summary by CodeRabbit

  • New Features

    • Introduced a new API with endpoints for health checks and resume uploads (PDF/DOCX), including resume conversion and storage.
    • Added robust database schema and models for users, jobs, and resumes with relationships.
    • Implemented middleware for request tracking and custom error handling.
  • Bug Fixes

    • Improved error handling for file uploads and database connectivity in API endpoints.
  • Chores

    • Major cleanup: removed legacy scripts, Streamlit apps, and old data processing utilities.
    • Updated and modernized dependencies, focusing on FastAPI and related libraries.
    • Simplified project documentation and configuration files.

Copy link

coderabbitai bot commented Apr 19, 2025

Walkthrough

This update represents a major refactor and re-architecture of the project. The previous codebase, which was primarily structured around a Streamlit-based resume matcher application with numerous utility scripts and NLP processing modules, has been almost entirely removed. In its place, a new backend API is introduced, built using FastAPI, SQLAlchemy ORM models, and modern Python packaging practices. The new backend includes endpoints for health checks and resume uploads, database models for users, jobs, and resumes, and a service layer for file conversion and storage. All legacy scripts, Streamlit apps, and related utilities have been deleted.

Changes

Files/Paths Change Summary
.gitignore Updated ignored files and directories; removed some paths and added new ones for Python and Cursor.
.pre-commit-config.yaml, build.dockerfile, docker-compose.yml Removed pre-commit hooks and all Docker-related files/configuration.
requirements.txt Overhauled dependency list: removed data science/NLP/Streamlit packages, added modern web/API stack.
README.md Replaced full documentation with a single line: "Veridis Quo".
Demo/DemoData.py Deleted static data source containing sample jobs and resumes.
run_first.py, resume_matcher/, scripts/, streamlit_app.py, streamlit_interactive.py, streamlit_second.py Deleted all legacy scripts, data extractors, processors, parsers, utilities, and all Streamlit-based applications.
app/main.py, app/base.py Added new FastAPI application entry point and app factory with middleware, exception handlers, and static file serving.
app/api/__init__.py, app/api/middleware.py, app/api/router/health.py, app/api/router/v1/__init__.py, app/api/router/v1/resume.py Introduced new API modules: health check, versioned routing, resume upload endpoint, and request ID middleware.
app/core/__init__.py, app/core/config.py, app/core/database.py, app/core/exceptions.py Added application core: configuration, singleton database connection, session provider, and exception handlers.
app/models/__init__.py, app/models/base.py, app/models/association.py, app/models/job.py, app/models/resume.py, app/models/user.py Introduced new SQLAlchemy ORM models for users, jobs, resumes, and their associations.
app/services/resume_service.py Added service for converting uploaded resume files to Markdown and storing them in the database.
app/utils/utils.py Added utility for database session management.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FastAPIApp
    participant RequestIDMiddleware
    participant ResumeService
    participant Database

    Client->>FastAPIApp: POST /api/v1/resume/upload (file)
    FastAPIApp->>RequestIDMiddleware: Process request, generate request_id
    RequestIDMiddleware->>FastAPIApp: Pass request with request_id
    FastAPIApp->>ResumeService: convert_and_store_resume(file_bytes, file_type, filename)
    ResumeService->>Database: Store resume content and metadata
    Database-->>ResumeService: Confirm storage, return resume_id
    ResumeService-->>FastAPIApp: Return resume_id
    FastAPIApp-->>Client: Respond with status and resume_id
Loading
sequenceDiagram
    participant Client
    participant FastAPIApp
    participant Database

    Client->>FastAPIApp: GET /ping
    FastAPIApp->>Database: Execute SELECT 1
    Database-->>FastAPIApp: Return result
    FastAPIApp-->>Client: Respond with {"message": "pong", "db_status": "..."}
Loading

Poem

A new warren built, the old one swept,
Where Streamlit fields and scripts once slept.
Now FastAPI burrows deep,
With models, routes, and middleware to keep.
Old carrots gone, new seeds are sown,
The backend garden stands alone.
Veridis Quo, the journey’s begun—let’s hop along, there’s work to be done!

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@rahulsingh237 rahulsingh237 changed the base branch from main to veridis-quo April 19, 2025 16:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 24

🔭 Outside diff range comments (1)
.gitignore (1)

147-149: 🛠️ Refactor suggestion

Remove duplicate ignore patterns and fix path format.

The entries for __pycache__/ and __pypackages__ are already declared earlier in this file (lines 2 and 96). Duplicate patterns can clutter the ignore list and cause confusion. Please remove them, and if you do need to ignore the __pypackages__ directory, ensure you include the trailing slash.

Apply this diff to clean up:

- __pycache__/
- __pypackages__
🧹 Nitpick comments (23)
.gitignore (1)

140-142: Ensure correct directory ignore and comment consistency.

The new “Cursor's Files” section is useful, but consider refining it for consistency with other sections:

  • Align the comment style with other headings (e.g., # Cursor Files instead of # Cursor's Files).
  • If .cursorrules is a directory, append a slash (.cursorrules/) to explicitly ignore its contents.
requirements.txt (2)

26-27: Assess heavy ML runtime dependencies
onnxruntime==1.21.1 and ollama==0.4.7 will dramatically increase install size. If model inference is optional, think about moving these into an extras_require (e.g., ml) rather than the core requirements.

Also applies to: 29-30


1-47: Consider splitting core, dev, and optional dependencies
For better maintainability, you could adopt separate requirement files (e.g., requirements.txt for core, requirements-dev.txt for linting/tests) and define optional features (ml, html) via extras_require in setup.py. This keeps installs lean and purpose‑driven.

app/models/base.py (1)

1-5: Add context and naming conventions for the declarative base.

Right now, Base is a bare subclass of DeclarativeBase without any docstring or metadata. Consider adding a module‑level docstring to explain its purpose, and, if you need consistent naming conventions for tables, indexes, and constraints, configure metadata on Base:

class Base(DeclarativeBase):
    """
    Declarative base for all ORM models.
    """
    metadata = MetaData(
        naming_convention={
            "ix": "ix_%(column_0_label)s",
            "uq": "uq_%(table_name)s_%(column_0_name)s",
            "ck": "ck_%(table_name)s_%(constraint_name)s",
            "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
            "pk": "pk_%(table_name)s"
        }
    )
app/api/middleware.py (2)

6-17: Good implementation of request ID middleware, consider adding the ID to response headers

This middleware correctly generates and stores unique request IDs, but consider also returning the request ID in response headers to help with client-side debugging.

 class RequestIDMiddleware(BaseHTTPMiddleware):
     async def dispatch(self, request: Request, call_next):
         path_parts = request.url.path.strip("/").split("/")
 
         # Safely grab the 3rd part: /api/v1/<service>
         service_tag = f"{path_parts[2]}:" if len(path_parts) > 2 else ""
 
         request_id = f"{service_tag}{uuid4()}"
         request.state.request_id = request_id
 
         response = await call_next(request)
+        response.headers["X-Request-ID"] = request_id
         return response

10-11: Add a class-level docstring to clarify URL structure assumption

The comment explains the URL path assumption, but this would be better documented in a class-level docstring.

 class RequestIDMiddleware(BaseHTTPMiddleware):
+    """
+    Middleware that adds a unique request ID to each request.
+    
+    The request ID includes a service tag extracted from the URL path
+    if the path follows the pattern /api/v1/<service>. This helps with
+    request tracing and debugging.
+    """
     async def dispatch(self, request: Request, call_next):
         path_parts = request.url.path.strip("/").split("/")
 
         # Safely grab the 3rd part: /api/v1/<service>
         service_tag = f"{path_parts[2]}:" if len(path_parts) > 2 else ""
app/models/user.py (1)

6-15: Add created_at and updated_at timestamps for better audit trails

Timestamp fields are useful for tracking when records were created or last modified, which is especially important for user data.

 from .base import Base
-from sqlalchemy import Column, String, Integer
+from sqlalchemy import Column, String, Integer, DateTime
 from sqlalchemy.orm import relationship
+from datetime import datetime
 
 
 class User(Base):
     __tablename__ = "users"
 
     id = Column(Integer, primary_key=True, index=True)
 
     email = Column(String, unique=True, index=True, nullable=False)
     name = Column(String, nullable=False)
+    created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
+    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
 
     resumes = relationship("ProcessedResume", back_populates="owner")
     jobs = relationship("Job", back_populates="owner")
app/base.py (1)

41-41: Consider moving database initialization to a more appropriate location

Database initialization happens during app creation, which could be problematic for migrations or tests where you might want to control the timing of schema creation.

Consider moving this to a separate function that can be called explicitly during application startup, rather than during app creation. This would allow more flexibility for testing and migrations.

def init_db():
    Base.metadata.create_all(bind=db_singleton.engine)

Then in your application startup code (like main.py):

app = create_app()
if settings.AUTO_INIT_DB:
    init_db()
app/api/router/health.py (2)

11-13: Improve docstring for better API documentation

The current docstring is very brief. Adding more details will improve the API documentation.

 """
-health check endpoint
+Health check endpoint that verifies the API is running and the database is reachable.
+
+Returns:
+    dict: Contains a "pong" message and the database connection status.
 """

14-19: Consider adding timeouts and health checks for other critical services

The health check only verifies database connectivity without a timeout, which could lead to slow responses if the database is having issues.

+from sqlalchemy.exc import SQLAlchemyError
+
 @health_check.get("/ping", tags=["health_check"], status_code=status.HTTP_200_OK)
 def ping(db: Session = Depends(get_db_session)):
     """
     health check endpoint
     """
+    health_status = {"message": "pong", "services": {}}
+    
     try:
-        result = db.execute(text("SELECT 1")).fetchone()
-        db_status = "reachable" if result is not None else "not reachable"
-    except Exception as e:
-        db_status = f"error: {str(e)}"
-    return {"message": "pong", "database": db_status}
+        # Set a timeout for the database query
+        result = db.execute(text("SELECT 1")).fetchone()
+        health_status["services"]["database"] = {
+            "status": "up" if result is not None else "down",
+            "details": None
+        }
+    except SQLAlchemyError as e:
+        health_status["services"]["database"] = {
+            "status": "down",
+            "details": str(e)
+        }
+    
+    # Add checks for other critical services here if needed
+    
+    return health_status
app/core/__init__.py (1)

10-11: Minor PEP 8 style concern with consecutive blank lines.

There are consecutive blank lines at these locations. According to PEP 8, you should have only one blank line between functions and class methods.

- 

def get_db_session():
- 

__all__ = [

Also applies to: 27-28

app/core/database.py (1)

25-26: Consider adding type hints and docstrings to improve code clarity.

Adding return type hints and docstrings would make the code more self-documenting and help with IDE autocompletion.

-    def get_session(self):
+    def get_session(self) -> "Session":
+        """
+        Returns a new SQLAlchemy session from the session factory.
+        
+        Returns:
+            Session: A new database session instance.
+        """
         return self.session()
app/core/exceptions.py (1)

7-28: Add docstrings to exception handlers for better code documentation.

All three exception handlers are missing docstrings. Adding them would improve code documentation and make the purpose and behavior of each handler clearer.

For example:

+async def custom_http_exception_handler(request: Request, exc: HTTPException):
+    """
+    Handle HTTPException by returning a JSON response with the exception details
+    and request ID for traceability.
+    
+    Args:
+        request: The FastAPI request object
+        exc: The HTTPException that was raised
+        
+    Returns:
+        JSONResponse: A formatted JSON response with status code and error details
+    """
app/models/resume.py (2)

8-11: Define length constraints for String columns.

String columns like resume_id should have length constraints to prevent storing excessively long strings and to align with database best practices.

-    resume_id = Column(String, primary_key=True, index=True)  # uuid field
+    resume_id = Column(String(36), primary_key=True, index=True)  # uuid field

class Resume(Base):
    __tablename__ = "resumes"

    id = Column(Integer, primary_key=True, index=True)
-    resume_id = Column(String, unique=True, nullable=False)
+    resume_id = Column(String(36), unique=True, nullable=False)
-    content_type = Column(String, nullable=False)
+    content_type = Column(String(100), nullable=False)

Also applies to: 29-33


8-36: Add docstrings to models for better code documentation.

Both the ProcessedResume and Resume models lack docstrings. Adding them would improve code documentation and make the purpose of each model and its fields clearer to developers.

class ProcessedResume(Base):
+    """
+    Represents a processed resume with structured data extracted from a raw resume.
+    
+    This model stores the parsed and structured data from a resume, including personal
+    information, experiences, skills, education, etc. It maintains relationships with
+    the user who owns the resume and the jobs to which the resume is related.
+    """
    # ... existing code ...

class Resume(Base):
+    """
+    Represents a raw resume document stored in the system.
+    
+    This model stores the original content of uploaded resumes along with metadata
+    like content type and a unique identifier linking it to a processed version.
+    """
    # ... existing code ...
app/api/router/v1/resume.py (4)

19-24: Docstring does not match implementation

The docstring states the function converts to HTML/Markdown, but the implementation only supports Markdown output (hardcoded as "md" on line 48).

Update the docstring to reflect the actual implementation:

-    Accepts a PDF or DOCX file, converts it to HTML/Markdown, and stores it in the database.
+    Accepts a PDF or DOCX file, converts it to Markdown, and stores it in the database.

27-35: Consider externalizing MIME type constants

The allowed content types are hardcoded in the route handler. Consider moving these to a constants file for reuse.

Extract the MIME types to a constants file:

-    allowed_content_types = [
-        "application/pdf",
-        "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
-    ]
+    allowed_content_types = ALLOWED_RESUME_MIME_TYPES

And create a constants file like app/core/constants.py:

ALLOWED_RESUME_MIME_TYPES = [
    "application/pdf",
    "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
]

43-49: Create and use service with context manager or dependency injection

The service is instantiated directly in the handler, which is not ideal for testing or dependency management.

Consider implementing a context manager pattern for the service or using dependency injection:

-        resume_id = ResumeService(db).convert_and_store_resume(
+        service = ResumeService(db)
+        resume_id = service.convert_and_store_resume(

Or better, create a dependency:

def get_resume_service(db: Session = Depends(get_db_session)):
    return ResumeService(db)

@resume_router.post(...)
async def upload_resume(
    request: Request,
    file: UploadFile = File(...),
    service: ResumeService = Depends(get_resume_service),
):
    # Use service directly

53-57: Add standardized response model

The endpoint returns an ad-hoc dictionary. Consider defining a response model for better type safety and API documentation.

Define a Pydantic response model:

from pydantic import BaseModel

class ResumeUploadResponse(BaseModel):
    message: str
    request_id: str
    resume_id: str

@resume_router.post("/upload", response_model=ResumeUploadResponse)

Then return a properly typed response:

-    return {
-        "message": f"File {file.filename} successfully processed as MD and stored in the DB",
-        "request_id": request_id,
-        "resume_id": resume_id,
-    }
+    return ResumeUploadResponse(
+        message=f"File {file.filename} successfully processed as MD and stored in the DB",
+        request_id=request_id,
+        resume_id=resume_id
+    )
app/models/job.py (3)

11-12: Consider using UUID type instead of String for job_id

Using String for UUID fields is functional but not ideal for type safety and query optimization.

Consider using SQLAlchemy's UUID type:

-from sqlalchemy import Column, String, Text, Date, Integer, ForeignKey
+from sqlalchemy import Column, String, Text, Date, Integer, ForeignKey
+from sqlalchemy.dialects.postgresql import UUID
+import uuid

...

-    job_id = Column(String, primary_key=True, index=True)  # uuid field
+    job_id = Column(UUID(as_uuid=True), primary_key=True, index=True, default=uuid.uuid4)

If you're not using PostgreSQL, you can use a String with a default UUID generator:

-    job_id = Column(String, primary_key=True, index=True)  # uuid field
+    job_id = Column(String(36), primary_key=True, index=True, default=lambda: str(uuid.uuid4()))

18-22: Consider adding schema validation for JSON columns

The model uses several JSON columns without schema validation, which could lead to inconsistent data.

Consider implementing Pydantic models for JSON schema validation:

from typing import List, Optional
from pydantic import BaseModel, validator

class KeyResponsibilities(BaseModel):
    items: List[str]
    
    @validator('items')
    def validate_items(cls, v):
        if not v or len(v) == 0:
            raise ValueError('At least one responsibility is required')
        return v

# Then use it for validation before saving

Or use SQLAlchemy validation:

from sqlalchemy.ext.mutable import MutableDict, MutableList
from sqlalchemy.orm import validates

class Job(Base):
    # ...
    key_responsibilities = Column(MutableList.as_mutable(JSON), nullable=True)
    
    @validates('key_responsibilities')
    def validate_key_responsibilities(self, key, value):
        # Validate here
        return value

24-26: Add ON DELETE behavior specification to foreign key

The foreign key doesn't specify ON DELETE behavior, which defaults to NO ACTION in most databases.

Consider specifying the ON DELETE behavior explicitly:

-    owner_id = Column(Integer, ForeignKey("users.id"), nullable=False)
+    owner_id = Column(Integer, ForeignKey("users.id", ondelete="CASCADE"), nullable=False)

Or if you want to prevent deletion of users with jobs:

-    owner_id = Column(Integer, ForeignKey("users.id"), nullable=False)
+    owner_id = Column(Integer, ForeignKey("users.id", ondelete="RESTRICT"), nullable=False)
app/services/resume_service.py (1)

47-56: Refactor MIME type handling

The _get_file_extension method has a complex conditional structure and doesn't handle unsupported MIME types clearly.

Use a dictionary mapping for cleaner code:

     def _get_file_extension(self, file_type: str) -> str:
         """Returns the appropriate file extension based on MIME type"""
-        if file_type == "application/pdf":
-            return ".pdf"
-        elif (
-            file_type
-            == "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
-        ):
-            return ".docx"
-        return ""
+        mime_to_extension = {
+            "application/pdf": ".pdf",
+            "application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx"
+        }
+        
+        extension = mime_to_extension.get(file_type, "")
+        if not extension:
+            raise ValueError(f"Unsupported MIME type: {file_type}")
+        return extension
🛑 Comments failed to post (24)
requirements.txt (1)

10-11: ⚠️ Potential issue

Duplicate environment variable loader dependencies
The file includes both dotenv==0.9.9 and python-dotenv==1.1.0. These libraries overlap in functionality and may conflict at runtime. Please confirm which one is required and remove the other to prevent ambiguity.

app/core/config.py (1)

10-11: 🛠️ Refactor suggestion

Make truly optional fields default to None.

Fields typed as Optional[str] without defaults will be required by Pydantic and cause validation errors if missing. To correct this, assign None defaults:

-    DATABASE_URL: Optional[str]
+    DATABASE_URL: Optional[str] = None
-    SESSION_SECRET_KEY: Optional[str]
+    SESSION_SECRET_KEY: Optional[str] = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    DATABASE_URL: Optional[str] = None
    SESSION_SECRET_KEY: Optional[str] = None
README.md (1)

1-1: 🛠️ Refactor suggestion

⚠️ Potential issue

Critical: README stripped of all project documentation.

The README now only contains a placeholder title (# Veridis Quo) with no usage instructions, installation steps, or contribution guidelines. Please restore or replace it with meaningful content so new developers can onboard quickly.

app/utils/utils.py (1)

5-11: ⚠️ Potential issue

Duplicate functionality with misused singleton pattern.

This get_db_session function duplicates functionality already present in app/core/__init__.py while incorrectly using the singleton pattern. Creating a new DatabaseConnectionSingleton instance on each call defeats the purpose of a singleton and could lead to multiple database connections.

Replace with this implementation that properly utilizes the existing singleton:

-def get_db_session():
-    db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL)
-    db = db_singleton.get_session()
-    try:
-        yield db
-    finally:
-        db.close()
+from app.core import get_db_session

Or, if you need to keep this function in utils.py, implement it to use a single shared instance:

+_db_singleton = None
+
 def get_db_session():
-    db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL)
+    global _db_singleton
+    if _db_singleton is None:
+        _db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL)
+    db_singleton = _db_singleton
     db = db_singleton.get_session()
     try:
         yield db
     finally:
         db.close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

_db_singleton = None

def get_db_session():
    global _db_singleton
    if _db_singleton is None:
        _db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL)
    db_singleton = _db_singleton
    db = db_singleton.get_session()
    try:
        yield db
    finally:
        db.close()
app/main.py (1)

1-7: 🛠️ Refactor suggestion

Ensure reload=True is only enabled in development environments.

The code is well-structured as a FastAPI application entry point. However, the reload=True parameter in the uvicorn run command is appropriate for development but should be disabled in production for performance and stability reasons.

Consider making the reload parameter environment-dependent:

-    uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=True)
+    is_development = settings.ENVIRONMENT.lower() == "development"
+    uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=is_development)

This would require importing settings from app.core.config at the top of the file.

Committable suggestion skipped: line range outside the PR's diff.

app/models/association.py (1)

5-10: ⚠️ Potential issue

Fix foreign key reference to processed_resumes table.

There's a schema design inconsistency in the association table. According to the relationships defined in the models, this table should link jobs to processed_resumes, but the foreign key is incorrectly pointing to resumes.resume_id.

From the model definitions, ProcessedResume has resume_id as its primary key, and both Job and ProcessedResume models establish relationships through this association table. However, Resume is a different model with id as its primary key.

Fix the foreign key reference:

 job_resume_association = Table(
     "job_resume",
     Base.metadata,
     Column("job_id", String, ForeignKey("jobs.job_id"), primary_key=True),
-    Column("resume_id", String, ForeignKey("resumes.resume_id"), primary_key=True),
+    Column("resume_id", String, ForeignKey("processed_resumes.resume_id"), primary_key=True),
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

job_resume_association = Table(
    "job_resume",
    Base.metadata,
    Column("job_id", String, ForeignKey("jobs.job_id"), primary_key=True),
    Column("resume_id", String, ForeignKey("processed_resumes.resume_id"), primary_key=True),
)
app/models/user.py (1)

6-15: 🛠️ Refactor suggestion

Add field length constraints for better database efficiency

The email column is defined without a length constraint, which may result in inefficient database storage. Most databases will allocate large amounts of space for unconstrained strings.

 class User(Base):
     __tablename__ = "users"
 
     id = Column(Integer, primary_key=True, index=True)
 
-    email = Column(String, unique=True, index=True, nullable=False)
-    name = Column(String, nullable=False)
+    email = Column(String(255), unique=True, index=True, nullable=False)
+    name = Column(String(255), nullable=False)
 
     resumes = relationship("ProcessedResume", back_populates="owner")
     jobs = relationship("Job", back_populates="owner")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

class User(Base):
    __tablename__ = "users"

    id = Column(Integer, primary_key=True, index=True)

    email = Column(String(255), unique=True, index=True, nullable=False)
    name = Column(String(255), nullable=False)

    resumes = relationship("ProcessedResume", back_populates="owner")
    jobs = relationship("Job", back_populates="owner")
app/base.py (2)

28-34: 🛠️ Refactor suggestion

Consider more restrictive CORS settings for production environments

The current CORS middleware configuration allows all methods and headers, which might be too permissive for a production environment.

 app.add_middleware(
     CORSMiddleware,
     allow_origins=settings.ALLOWED_ORIGINS,
     allow_credentials=True,
-    allow_methods=["*"],
-    allow_headers=["*"],
+    allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"],
+    allow_headers=["Content-Type", "Authorization", "X-Request-ID"],
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    app.add_middleware(
        CORSMiddleware,
        allow_origins=settings.ALLOWED_ORIGINS,
        allow_credentials=True,
        allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"],
        allow_headers=["Content-Type", "Authorization", "X-Request-ID"],
    )

25-27: 🛠️ Refactor suggestion

Add secure cookie setting for production environments

The SessionMiddleware is using same_site="lax", which is good for security, but there's no explicit secure flag for HTTPS.

 app.add_middleware(
-    SessionMiddleware, secret_key=settings.SESSION_SECRET_KEY, same_site="lax"
+    SessionMiddleware, 
+    secret_key=settings.SESSION_SECRET_KEY, 
+    same_site="lax",
+    https_only=settings.USE_HTTPS
 )

This requires adding a USE_HTTPS setting in your configuration that is True in production environments.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    app.add_middleware(
        SessionMiddleware,
        secret_key=settings.SESSION_SECRET_KEY,
        same_site="lax",
        https_only=settings.USE_HTTPS
    )
app/api/router/health.py (1)

10-10: 🛠️ Refactor suggestion

Fix dependency injection to comply with B008 rule

The static analysis tool flagged using Depends directly in function parameters, which is not recommended.

+db_dependency = Depends(get_db_session)
+
 @health_check.get("/ping", tags=["health_check"], status_code=status.HTTP_200_OK)
-def ping(db: Session = Depends(get_db_session)):
+def ping(db: Session = db_dependency):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

# module‐level dependency variable, so we don’t call Depends() in the function signature
db_dependency = Depends(get_db_session)

@health_check.get("/ping", tags=["health_check"], status_code=status.HTTP_200_OK)
def ping(db: Session = db_dependency):
    # (function body unchanged)
    return {"status": "ok"}
🧰 Tools
🪛 Ruff (0.8.2)

10-10: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

app/core/database.py (2)

18-24: 🛠️ Refactor suggestion

Missing error handling for database connection failures.

There's no error handling for potential database connection failures. This could lead to unhandled exceptions during application startup or runtime.

def __init__(self, db_url: str):
    if not hasattr(self, "engine"):
-        self.engine = create_engine(
-            db_url, connect_args={"check_same_thread": False}
-        )
-        self.session = sessionmaker(autoflush=False, bind=self.engine)
+        try:
+            self.engine = create_engine(
+                db_url, connect_args={"check_same_thread": False}
+            )
+            self.session = sessionmaker(autoflush=False, bind=self.engine)
+        except Exception as e:
+            # Log the error
+            raise RuntimeError(f"Failed to initialize database connection: {str(e)}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def __init__(self, db_url: str):
        if not hasattr(self, "engine"):
            try:
                self.engine = create_engine(
                    db_url, connect_args={"check_same_thread": False}
                )
                self.session = sessionmaker(autoflush=False, bind=self.engine)
            except Exception as e:
                # Log the error
                raise RuntimeError(f"Failed to initialize database connection: {str(e)}") from e

18-24: 🛠️ Refactor suggestion

SQLite-specific connection arguments may limit database flexibility.

The connect_args={"check_same_thread": False} parameter is SQLite-specific. This could cause issues if migrating to a different database engine in the future.

def __init__(self, db_url: str):
    if not hasattr(self, "engine"):
+       connect_args = {}
+       if db_url.startswith('sqlite'):
+           connect_args["check_same_thread"] = False
        self.engine = create_engine(
-           db_url, connect_args={"check_same_thread": False}
+           db_url, connect_args=connect_args
        )
        self.session = sessionmaker(autoflush=False, bind=self.engine)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def __init__(self, db_url: str):
        if not hasattr(self, "engine"):
            connect_args = {}
            if db_url.startswith("sqlite"):
                connect_args["check_same_thread"] = False
            self.engine = create_engine(
                db_url, connect_args=connect_args
            )
            self.session = sessionmaker(autoflush=False, bind=self.engine)
app/core/exceptions.py (1)

23-28: 🛠️ Refactor suggestion

Add logging to the unhandled exception handler.

The unhandled exception handler masks all exceptions with a generic message without logging them. This could make debugging server-side issues difficult.

async def unhandled_exception_handler(request: Request, exc: Exception):
    request_id = getattr(request.state, "request_id", "")
+    # Log the exception with traceback for debugging
+    import logging
+    logging.exception(f"Unhandled exception occurred for request {request_id}: {str(exc)}")
    return JSONResponse(
        status_code=HTTP_500_INTERNAL_SERVER_ERROR,
        content={"detail": "Internal Server Error", "request_id": request_id},
    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

async def unhandled_exception_handler(request: Request, exc: Exception):
    request_id = getattr(request.state, "request_id", "")
    # Log the exception with traceback for debugging
    import logging
    logging.exception(f"Unhandled exception occurred for request {request_id}: {str(exc)}")
    return JSONResponse(
        status_code=HTTP_500_INTERNAL_SERVER_ERROR,
        content={"detail": "Internal Server Error", "request_id": request_id},
    )
app/models/resume.py (2)

8-27: 🛠️ Refactor suggestion

Add created_at and updated_at timestamps to both models.

Both models lack timestamps for tracking when records were created or last updated. This is important for auditing, debugging, and implementing data retention policies.

+from sqlalchemy import Column, String, Integer, ForeignKey, Text, DateTime
+from sqlalchemy.sql import func

class ProcessedResume(Base):
    # ... existing code ...
+    created_at = Column(DateTime, server_default=func.now(), nullable=False)
+    updated_at = Column(DateTime, server_default=func.now(), onupdate=func.now(), nullable=False)

class Resume(Base):
    # ... existing code ...
+    created_at = Column(DateTime, server_default=func.now(), nullable=False)
+    updated_at = Column(DateTime, server_default=func.now(), onupdate=func.now(), nullable=False)

Also applies to: 29-36


8-27: 🛠️ Refactor suggestion

Add relationship between ProcessedResume and Resume models.

There's currently no relationship defined between the ProcessedResume and Resume models, even though they seem related by their resume_id fields. This makes it difficult to query the raw resume content for a processed resume.

Consider adding a relationship:

class ProcessedResume(Base):
    # ... existing code ...
    
    owner_id = Column(Integer, ForeignKey("users.id"), nullable=False)
    owner = relationship("User", back_populates="processed_resumes")
+    
+    # Relationship to the original resume
+    raw_resume = relationship("Resume", foreign_keys=[resume_id], 
+                              primaryjoin="ProcessedResume.resume_id == Resume.resume_id", 
+                              uselist=False)

    jobs = relationship(
        "Job", secondary=job_resume_association, back_populates="processed_resumes"
    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

class ProcessedResume(Base):
    __tablename__ = "processed_resumes"

    resume_id = Column(String, primary_key=True, index=True)  # uuid field
    personal_data = Column(JSON, nullable=False)
    experiences = Column(JSON, nullable=True)
    projects = Column(JSON, nullable=True)
    skills = Column(JSON, nullable=True)
    research_work = Column(JSON, nullable=True)
    achievements = Column(JSON, nullable=True)
    education = Column(JSON, nullable=True)
    extracted_keywords = Column(JSON, nullable=True)

    owner_id = Column(Integer, ForeignKey("users.id"), nullable=False)
    owner = relationship("User", back_populates="processed_resumes")

    # Relationship to the original resume
    raw_resume = relationship(
        "Resume",
        foreign_keys=[resume_id],
        primaryjoin="ProcessedResume.resume_id == Resume.resume_id",
        uselist=False,
    )

    jobs = relationship(
        "Job", secondary=job_resume_association, back_populates="processed_resumes"
    )
app/api/router/v1/resume.py (2)

16-17: ⚠️ Potential issue

Fix function calls in default arguments

The static analysis tool correctly flagged that using File(...) and Depends(get_db_session) directly in parameter defaults can cause issues with reusing the same instances across requests.

Apply this diff to resolve the issue:

-async def upload_resume(
-    request: Request,
-    file: UploadFile = File(...),
-    db: Session = Depends(get_db_session),
-):
+async def upload_resume(
+    request: Request,
+    file: UploadFile = File(...),
+    db: Session = Depends(get_db_session),
+):

This isn't actually a functional issue in FastAPI's case, as it handles these properly behind the scenes, but it's still good practice to follow the linter's recommendation for consistency.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

16-16: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


17-17: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


50-51: ⚠️ Potential issue

Improve exception handling

The exception is caught and re-raised without preserving the original traceback context.

Apply this fix to preserve the exception context:

-    except Exception as e:
-        raise HTTPException(status_code=500, detail=f"Error processing file: {str(e)}")
+    except Exception as e:
+        raise HTTPException(
+            status_code=500, 
+            detail=f"Error processing file: {str(e)}"
+        ) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Error processing file: {str(e)}"
        ) from e
🧰 Tools
🪛 Ruff (0.8.2)

51-51: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

app/models/job.py (2)

20-20: ⚠️ Potential issue

Fix typo in column name

There is a typo in the column name "compensation_and_benfits" - missing an "e" in "benefits".

Apply this fix:

-    compensation_and_benfits = Column(JSON, nullable=True)
+    compensation_and_benefits = Column(JSON, nullable=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    compensation_and_benefits = Column(JSON, nullable=True)

1-31: 🛠️ Refactor suggestion

Add timestamp columns for auditing

The model lacks timestamp columns for tracking creation and modification times.

Add timestamp columns:

 from sqlalchemy import Column, String, Text, Date, Integer, ForeignKey
+from sqlalchemy.sql import func
 from sqlalchemy.types import JSON
 from sqlalchemy.orm import relationship
 from .association import job_resume_association

 class Job(Base):
     __tablename__ = "jobs"
     
     job_id = Column(String, primary_key=True, index=True)  # uuid field
     # ... other columns ...
+    created_at = Column(Date, server_default=func.now(), nullable=False)
+    updated_at = Column(Date, server_default=func.now(), onupdate=func.now(), nullable=False)
     
     # relationships ...
app/services/resume_service.py (5)

64-66: ⚠️ Potential issue

Store original filename in database

The filename parameter is passed to _store_resume_in_db but not used to store the original filename.

Update the Resume model to include the original filename:

         resume = Resume(
-            resume_id=resume_id, content=text_content, content_type=content_type
+            resume_id=resume_id, 
+            content=text_content, 
+            content_type=content_type,
+            original_filename=filename
         )

Make sure the Resume model has this field. If not, add it to the model:

class Resume(Base):
    __tablename__ = "resumes"
    
    id = Column(Integer, primary_key=True, index=True)
    resume_id = Column(String, unique=True, nullable=False)
    content = Column(Text, nullable=False)
    content_type = Column(String, nullable=False)
    original_filename = Column(String, nullable=True)  # Add this field

17-28: ⚠️ Potential issue

Fix incorrect return type in docstring

The docstring incorrectly states the function returns None, but it returns a resume_id string.

Fix the docstring to match the implementation:

         Returns:
-            None
+            resume_id (str): UUID string of the stored resume record.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        """
        Converts resume file (PDF/DOCX) to text using MarkItDown and stores it in the database.

        Args:
            file_bytes: Raw bytes of the uploaded file
            file_type: MIME type of the file ("application/pdf" or "application/vnd.openxmlformats-officedocument.wordprocessingml.document")
            filename: Original filename
            content_type: Output format ("md" for markdown or "html")

        Returns:
            resume_id (str): UUID string of the stored resume record.
        """

58-71: 🛠️ Refactor suggestion

Improve database transaction handling

The database operation doesn't handle transaction failures or implement retries.

Add error handling and transaction management:

     def _store_resume_in_db(
         self, resume_id: str, filename: str, text_content: str, content_type: str
     ):
         """
         Stores the parsed resume content in the database.
         """
         resume = Resume(
             resume_id=resume_id, content=text_content, content_type=content_type
         )
 
-        self.db.add(resume)
-        self.db.commit()
+        try:
+            self.db.add(resume)
+            self.db.commit()
+        except Exception as db_error:
+            self.db.rollback()
+            raise ValueError(f"Database operation failed: {str(db_error)}")
 
         return resume

29-34: 🛠️ Refactor suggestion

Ensure temp file is secure and limit size

The code creates a temporary file but doesn't enforce size limits, which could lead to disk space issues.

Add file size validation before writing to disk:

+        # Check file size (e.g., limit to 10MB)
+        MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
+        if len(file_bytes) > MAX_FILE_SIZE:
+            raise ValueError(f"File size exceeds the maximum limit of {MAX_FILE_SIZE} bytes")
+
         with tempfile.NamedTemporaryFile(
             delete=False, suffix=self._get_file_extension(file_type)
         ) as temp_file:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        # Check file size (e.g., limit to 10MB)
        MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
        if len(file_bytes) > MAX_FILE_SIZE:
            raise ValueError(f"File size exceeds the maximum limit of {MAX_FILE_SIZE} bytes")

        with tempfile.NamedTemporaryFile(
            delete=False, suffix=self._get_file_extension(file_type)
        ) as temp_file:
            temp_file.write(file_bytes)
            temp_path = temp_file.name

35-41: 🛠️ Refactor suggestion

Add more robust error handling for conversion

The conversion process doesn't handle specific exceptions from MarkItDown, which could make debugging harder.

Improve error handling to catch specific exceptions:

         try:
-            result = self.md.convert(temp_path)
-            text_content = result.text_content
+            try:
+                result = self.md.convert(temp_path)
+                text_content = result.text_content
+                if not text_content or text_content.strip() == "":
+                    raise ValueError("Conversion resulted in empty content")
+            except Exception as conversion_error:
+                raise ValueError(f"Failed to convert document: {str(conversion_error)}")
+            
             resume_id = str(uuid.uuid4())
             self._store_resume_in_db(resume_id, filename, text_content, content_type)

             return resume_id
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        try:
            try:
                result = self.md.convert(temp_path)
                text_content = result.text_content
                if not text_content or text_content.strip() == "":
                    raise ValueError("Conversion resulted in empty content")
            except Exception as conversion_error:
                raise ValueError(f"Failed to convert document: {str(conversion_error)}")
            
            resume_id = str(uuid.uuid4())
            self._store_resume_in_db(resume_id, filename, text_content, content_type)

            return resume_id

@srbhr srbhr merged commit 6816ac2 into srbhr:veridis-quo Apr 19, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants