-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Issue 332 | ruff refactoring and requirements.txt update #343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update represents a major refactor and re-architecture of the project. The previous codebase, which was primarily structured around a Streamlit-based resume matcher application with numerous utility scripts and NLP processing modules, has been almost entirely removed. In its place, a new backend API is introduced, built using FastAPI, SQLAlchemy ORM models, and modern Python packaging practices. The new backend includes endpoints for health checks and resume uploads, database models for users, jobs, and resumes, and a service layer for file conversion and storage. All legacy scripts, Streamlit apps, and related utilities have been deleted. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastAPIApp
participant RequestIDMiddleware
participant ResumeService
participant Database
Client->>FastAPIApp: POST /api/v1/resume/upload (file)
FastAPIApp->>RequestIDMiddleware: Process request, generate request_id
RequestIDMiddleware->>FastAPIApp: Pass request with request_id
FastAPIApp->>ResumeService: convert_and_store_resume(file_bytes, file_type, filename)
ResumeService->>Database: Store resume content and metadata
Database-->>ResumeService: Confirm storage, return resume_id
ResumeService-->>FastAPIApp: Return resume_id
FastAPIApp-->>Client: Respond with status and resume_id
sequenceDiagram
participant Client
participant FastAPIApp
participant Database
Client->>FastAPIApp: GET /ping
FastAPIApp->>Database: Execute SELECT 1
Database-->>FastAPIApp: Return result
FastAPIApp-->>Client: Respond with {"message": "pong", "db_status": "..."}
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 24
🔭 Outside diff range comments (1)
.gitignore (1)
147-149
: 🛠️ Refactor suggestionRemove duplicate ignore patterns and fix path format.
The entries for
__pycache__/
and__pypackages__
are already declared earlier in this file (lines 2 and 96). Duplicate patterns can clutter the ignore list and cause confusion. Please remove them, and if you do need to ignore the__pypackages__
directory, ensure you include the trailing slash.Apply this diff to clean up:
- __pycache__/ - __pypackages__
🧹 Nitpick comments (23)
.gitignore (1)
140-142
: Ensure correct directory ignore and comment consistency.The new “Cursor's Files” section is useful, but consider refining it for consistency with other sections:
- Align the comment style with other headings (e.g.,
# Cursor Files
instead of# Cursor's Files
).- If
.cursorrules
is a directory, append a slash (.cursorrules/
) to explicitly ignore its contents.requirements.txt (2)
26-27
: Assess heavy ML runtime dependencies
onnxruntime==1.21.1
andollama==0.4.7
will dramatically increase install size. If model inference is optional, think about moving these into anextras_require
(e.g.,ml
) rather than the core requirements.Also applies to: 29-30
1-47
: Consider splitting core, dev, and optional dependencies
For better maintainability, you could adopt separate requirement files (e.g.,requirements.txt
for core,requirements-dev.txt
for linting/tests) and define optional features (ml
,html
) viaextras_require
insetup.py
. This keeps installs lean and purpose‑driven.app/models/base.py (1)
1-5
: Add context and naming conventions for the declarative base.Right now,
Base
is a bare subclass ofDeclarativeBase
without any docstring or metadata. Consider adding a module‑level docstring to explain its purpose, and, if you need consistent naming conventions for tables, indexes, and constraints, configuremetadata
onBase
:class Base(DeclarativeBase): """ Declarative base for all ORM models. """ metadata = MetaData( naming_convention={ "ix": "ix_%(column_0_label)s", "uq": "uq_%(table_name)s_%(column_0_name)s", "ck": "ck_%(table_name)s_%(constraint_name)s", "fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s", "pk": "pk_%(table_name)s" } )app/api/middleware.py (2)
6-17
: Good implementation of request ID middleware, consider adding the ID to response headersThis middleware correctly generates and stores unique request IDs, but consider also returning the request ID in response headers to help with client-side debugging.
class RequestIDMiddleware(BaseHTTPMiddleware): async def dispatch(self, request: Request, call_next): path_parts = request.url.path.strip("/").split("/") # Safely grab the 3rd part: /api/v1/<service> service_tag = f"{path_parts[2]}:" if len(path_parts) > 2 else "" request_id = f"{service_tag}{uuid4()}" request.state.request_id = request_id response = await call_next(request) + response.headers["X-Request-ID"] = request_id return response
10-11
: Add a class-level docstring to clarify URL structure assumptionThe comment explains the URL path assumption, but this would be better documented in a class-level docstring.
class RequestIDMiddleware(BaseHTTPMiddleware): + """ + Middleware that adds a unique request ID to each request. + + The request ID includes a service tag extracted from the URL path + if the path follows the pattern /api/v1/<service>. This helps with + request tracing and debugging. + """ async def dispatch(self, request: Request, call_next): path_parts = request.url.path.strip("/").split("/") # Safely grab the 3rd part: /api/v1/<service> service_tag = f"{path_parts[2]}:" if len(path_parts) > 2 else ""app/models/user.py (1)
6-15
: Add created_at and updated_at timestamps for better audit trailsTimestamp fields are useful for tracking when records were created or last modified, which is especially important for user data.
from .base import Base -from sqlalchemy import Column, String, Integer +from sqlalchemy import Column, String, Integer, DateTime from sqlalchemy.orm import relationship +from datetime import datetime class User(Base): __tablename__ = "users" id = Column(Integer, primary_key=True, index=True) email = Column(String, unique=True, index=True, nullable=False) name = Column(String, nullable=False) + created_at = Column(DateTime, default=datetime.utcnow, nullable=False) + updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False) resumes = relationship("ProcessedResume", back_populates="owner") jobs = relationship("Job", back_populates="owner")app/base.py (1)
41-41
: Consider moving database initialization to a more appropriate locationDatabase initialization happens during app creation, which could be problematic for migrations or tests where you might want to control the timing of schema creation.
Consider moving this to a separate function that can be called explicitly during application startup, rather than during app creation. This would allow more flexibility for testing and migrations.
def init_db(): Base.metadata.create_all(bind=db_singleton.engine)Then in your application startup code (like main.py):
app = create_app() if settings.AUTO_INIT_DB: init_db()app/api/router/health.py (2)
11-13
: Improve docstring for better API documentationThe current docstring is very brief. Adding more details will improve the API documentation.
""" -health check endpoint +Health check endpoint that verifies the API is running and the database is reachable. + +Returns: + dict: Contains a "pong" message and the database connection status. """
14-19
: Consider adding timeouts and health checks for other critical servicesThe health check only verifies database connectivity without a timeout, which could lead to slow responses if the database is having issues.
+from sqlalchemy.exc import SQLAlchemyError + @health_check.get("/ping", tags=["health_check"], status_code=status.HTTP_200_OK) def ping(db: Session = Depends(get_db_session)): """ health check endpoint """ + health_status = {"message": "pong", "services": {}} + try: - result = db.execute(text("SELECT 1")).fetchone() - db_status = "reachable" if result is not None else "not reachable" - except Exception as e: - db_status = f"error: {str(e)}" - return {"message": "pong", "database": db_status} + # Set a timeout for the database query + result = db.execute(text("SELECT 1")).fetchone() + health_status["services"]["database"] = { + "status": "up" if result is not None else "down", + "details": None + } + except SQLAlchemyError as e: + health_status["services"]["database"] = { + "status": "down", + "details": str(e) + } + + # Add checks for other critical services here if needed + + return health_statusapp/core/__init__.py (1)
10-11
: Minor PEP 8 style concern with consecutive blank lines.There are consecutive blank lines at these locations. According to PEP 8, you should have only one blank line between functions and class methods.
- def get_db_session():
- __all__ = [
Also applies to: 27-28
app/core/database.py (1)
25-26
: Consider adding type hints and docstrings to improve code clarity.Adding return type hints and docstrings would make the code more self-documenting and help with IDE autocompletion.
- def get_session(self): + def get_session(self) -> "Session": + """ + Returns a new SQLAlchemy session from the session factory. + + Returns: + Session: A new database session instance. + """ return self.session()app/core/exceptions.py (1)
7-28
: Add docstrings to exception handlers for better code documentation.All three exception handlers are missing docstrings. Adding them would improve code documentation and make the purpose and behavior of each handler clearer.
For example:
+async def custom_http_exception_handler(request: Request, exc: HTTPException): + """ + Handle HTTPException by returning a JSON response with the exception details + and request ID for traceability. + + Args: + request: The FastAPI request object + exc: The HTTPException that was raised + + Returns: + JSONResponse: A formatted JSON response with status code and error details + """app/models/resume.py (2)
8-11
: Define length constraints for String columns.String columns like
resume_id
should have length constraints to prevent storing excessively long strings and to align with database best practices.- resume_id = Column(String, primary_key=True, index=True) # uuid field + resume_id = Column(String(36), primary_key=True, index=True) # uuid field class Resume(Base): __tablename__ = "resumes" id = Column(Integer, primary_key=True, index=True) - resume_id = Column(String, unique=True, nullable=False) + resume_id = Column(String(36), unique=True, nullable=False) - content_type = Column(String, nullable=False) + content_type = Column(String(100), nullable=False)Also applies to: 29-33
8-36
: Add docstrings to models for better code documentation.Both the
ProcessedResume
andResume
models lack docstrings. Adding them would improve code documentation and make the purpose of each model and its fields clearer to developers.class ProcessedResume(Base): + """ + Represents a processed resume with structured data extracted from a raw resume. + + This model stores the parsed and structured data from a resume, including personal + information, experiences, skills, education, etc. It maintains relationships with + the user who owns the resume and the jobs to which the resume is related. + """ # ... existing code ... class Resume(Base): + """ + Represents a raw resume document stored in the system. + + This model stores the original content of uploaded resumes along with metadata + like content type and a unique identifier linking it to a processed version. + """ # ... existing code ...app/api/router/v1/resume.py (4)
19-24
: Docstring does not match implementationThe docstring states the function converts to HTML/Markdown, but the implementation only supports Markdown output (hardcoded as "md" on line 48).
Update the docstring to reflect the actual implementation:
- Accepts a PDF or DOCX file, converts it to HTML/Markdown, and stores it in the database. + Accepts a PDF or DOCX file, converts it to Markdown, and stores it in the database.
27-35
: Consider externalizing MIME type constantsThe allowed content types are hardcoded in the route handler. Consider moving these to a constants file for reuse.
Extract the MIME types to a constants file:
- allowed_content_types = [ - "application/pdf", - "application/vnd.openxmlformats-officedocument.wordprocessingml.document", - ] + allowed_content_types = ALLOWED_RESUME_MIME_TYPESAnd create a constants file like
app/core/constants.py
:ALLOWED_RESUME_MIME_TYPES = [ "application/pdf", "application/vnd.openxmlformats-officedocument.wordprocessingml.document", ]
43-49
: Create and use service with context manager or dependency injectionThe service is instantiated directly in the handler, which is not ideal for testing or dependency management.
Consider implementing a context manager pattern for the service or using dependency injection:
- resume_id = ResumeService(db).convert_and_store_resume( + service = ResumeService(db) + resume_id = service.convert_and_store_resume(Or better, create a dependency:
def get_resume_service(db: Session = Depends(get_db_session)): return ResumeService(db) @resume_router.post(...) async def upload_resume( request: Request, file: UploadFile = File(...), service: ResumeService = Depends(get_resume_service), ): # Use service directly
53-57
: Add standardized response modelThe endpoint returns an ad-hoc dictionary. Consider defining a response model for better type safety and API documentation.
Define a Pydantic response model:
from pydantic import BaseModel class ResumeUploadResponse(BaseModel): message: str request_id: str resume_id: str @resume_router.post("/upload", response_model=ResumeUploadResponse)Then return a properly typed response:
- return { - "message": f"File {file.filename} successfully processed as MD and stored in the DB", - "request_id": request_id, - "resume_id": resume_id, - } + return ResumeUploadResponse( + message=f"File {file.filename} successfully processed as MD and stored in the DB", + request_id=request_id, + resume_id=resume_id + )app/models/job.py (3)
11-12
: Consider using UUID type instead of String for job_idUsing String for UUID fields is functional but not ideal for type safety and query optimization.
Consider using SQLAlchemy's UUID type:
-from sqlalchemy import Column, String, Text, Date, Integer, ForeignKey +from sqlalchemy import Column, String, Text, Date, Integer, ForeignKey +from sqlalchemy.dialects.postgresql import UUID +import uuid ... - job_id = Column(String, primary_key=True, index=True) # uuid field + job_id = Column(UUID(as_uuid=True), primary_key=True, index=True, default=uuid.uuid4)If you're not using PostgreSQL, you can use a String with a default UUID generator:
- job_id = Column(String, primary_key=True, index=True) # uuid field + job_id = Column(String(36), primary_key=True, index=True, default=lambda: str(uuid.uuid4()))
18-22
: Consider adding schema validation for JSON columnsThe model uses several JSON columns without schema validation, which could lead to inconsistent data.
Consider implementing Pydantic models for JSON schema validation:
from typing import List, Optional from pydantic import BaseModel, validator class KeyResponsibilities(BaseModel): items: List[str] @validator('items') def validate_items(cls, v): if not v or len(v) == 0: raise ValueError('At least one responsibility is required') return v # Then use it for validation before savingOr use SQLAlchemy validation:
from sqlalchemy.ext.mutable import MutableDict, MutableList from sqlalchemy.orm import validates class Job(Base): # ... key_responsibilities = Column(MutableList.as_mutable(JSON), nullable=True) @validates('key_responsibilities') def validate_key_responsibilities(self, key, value): # Validate here return value
24-26
: Add ON DELETE behavior specification to foreign keyThe foreign key doesn't specify ON DELETE behavior, which defaults to NO ACTION in most databases.
Consider specifying the ON DELETE behavior explicitly:
- owner_id = Column(Integer, ForeignKey("users.id"), nullable=False) + owner_id = Column(Integer, ForeignKey("users.id", ondelete="CASCADE"), nullable=False)Or if you want to prevent deletion of users with jobs:
- owner_id = Column(Integer, ForeignKey("users.id"), nullable=False) + owner_id = Column(Integer, ForeignKey("users.id", ondelete="RESTRICT"), nullable=False)app/services/resume_service.py (1)
47-56
: Refactor MIME type handlingThe _get_file_extension method has a complex conditional structure and doesn't handle unsupported MIME types clearly.
Use a dictionary mapping for cleaner code:
def _get_file_extension(self, file_type: str) -> str: """Returns the appropriate file extension based on MIME type""" - if file_type == "application/pdf": - return ".pdf" - elif ( - file_type - == "application/vnd.openxmlformats-officedocument.wordprocessingml.document" - ): - return ".docx" - return "" + mime_to_extension = { + "application/pdf": ".pdf", + "application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx" + } + + extension = mime_to_extension.get(file_type, "") + if not extension: + raise ValueError(f"Unsupported MIME type: {file_type}") + return extension
🛑 Comments failed to post (24)
requirements.txt (1)
10-11:
⚠️ Potential issueDuplicate environment variable loader dependencies
The file includes bothdotenv==0.9.9
andpython-dotenv==1.1.0
. These libraries overlap in functionality and may conflict at runtime. Please confirm which one is required and remove the other to prevent ambiguity.app/core/config.py (1)
10-11: 🛠️ Refactor suggestion
Make truly optional fields default to
None
.Fields typed as
Optional[str]
without defaults will be required by Pydantic and cause validation errors if missing. To correct this, assignNone
defaults:- DATABASE_URL: Optional[str] + DATABASE_URL: Optional[str] = None - SESSION_SECRET_KEY: Optional[str] + SESSION_SECRET_KEY: Optional[str] = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.DATABASE_URL: Optional[str] = None SESSION_SECRET_KEY: Optional[str] = None
README.md (1)
1-1: 🛠️ Refactor suggestion
⚠️ Potential issueCritical: README stripped of all project documentation.
The README now only contains a placeholder title (
# Veridis Quo
) with no usage instructions, installation steps, or contribution guidelines. Please restore or replace it with meaningful content so new developers can onboard quickly.app/utils/utils.py (1)
5-11:
⚠️ Potential issueDuplicate functionality with misused singleton pattern.
This
get_db_session
function duplicates functionality already present inapp/core/__init__.py
while incorrectly using the singleton pattern. Creating a newDatabaseConnectionSingleton
instance on each call defeats the purpose of a singleton and could lead to multiple database connections.Replace with this implementation that properly utilizes the existing singleton:
-def get_db_session(): - db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL) - db = db_singleton.get_session() - try: - yield db - finally: - db.close() +from app.core import get_db_sessionOr, if you need to keep this function in utils.py, implement it to use a single shared instance:
+_db_singleton = None + def get_db_session(): - db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL) + global _db_singleton + if _db_singleton is None: + _db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL) + db_singleton = _db_singleton db = db_singleton.get_session() try: yield db finally: db.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements._db_singleton = None def get_db_session(): global _db_singleton if _db_singleton is None: _db_singleton = DatabaseConnectionSingleton(settings.DATABASE_URL) db_singleton = _db_singleton db = db_singleton.get_session() try: yield db finally: db.close()
app/main.py (1)
1-7: 🛠️ Refactor suggestion
Ensure reload=True is only enabled in development environments.
The code is well-structured as a FastAPI application entry point. However, the
reload=True
parameter in the uvicorn run command is appropriate for development but should be disabled in production for performance and stability reasons.Consider making the reload parameter environment-dependent:
- uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=True) + is_development = settings.ENVIRONMENT.lower() == "development" + uvicorn.run("app.main:app", host="0.0.0.0", port=8000, reload=is_development)This would require importing settings from app.core.config at the top of the file.
Committable suggestion skipped: line range outside the PR's diff.
app/models/association.py (1)
5-10:
⚠️ Potential issueFix foreign key reference to processed_resumes table.
There's a schema design inconsistency in the association table. According to the relationships defined in the models, this table should link
jobs
toprocessed_resumes
, but the foreign key is incorrectly pointing toresumes.resume_id
.From the model definitions,
ProcessedResume
hasresume_id
as its primary key, and bothJob
andProcessedResume
models establish relationships through this association table. However,Resume
is a different model withid
as its primary key.Fix the foreign key reference:
job_resume_association = Table( "job_resume", Base.metadata, Column("job_id", String, ForeignKey("jobs.job_id"), primary_key=True), - Column("resume_id", String, ForeignKey("resumes.resume_id"), primary_key=True), + Column("resume_id", String, ForeignKey("processed_resumes.resume_id"), primary_key=True), )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.job_resume_association = Table( "job_resume", Base.metadata, Column("job_id", String, ForeignKey("jobs.job_id"), primary_key=True), Column("resume_id", String, ForeignKey("processed_resumes.resume_id"), primary_key=True), )
app/models/user.py (1)
6-15: 🛠️ Refactor suggestion
Add field length constraints for better database efficiency
The email column is defined without a length constraint, which may result in inefficient database storage. Most databases will allocate large amounts of space for unconstrained strings.
class User(Base): __tablename__ = "users" id = Column(Integer, primary_key=True, index=True) - email = Column(String, unique=True, index=True, nullable=False) - name = Column(String, nullable=False) + email = Column(String(255), unique=True, index=True, nullable=False) + name = Column(String(255), nullable=False) resumes = relationship("ProcessedResume", back_populates="owner") jobs = relationship("Job", back_populates="owner")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.class User(Base): __tablename__ = "users" id = Column(Integer, primary_key=True, index=True) email = Column(String(255), unique=True, index=True, nullable=False) name = Column(String(255), nullable=False) resumes = relationship("ProcessedResume", back_populates="owner") jobs = relationship("Job", back_populates="owner")
app/base.py (2)
28-34: 🛠️ Refactor suggestion
Consider more restrictive CORS settings for production environments
The current CORS middleware configuration allows all methods and headers, which might be too permissive for a production environment.
app.add_middleware( CORSMiddleware, allow_origins=settings.ALLOWED_ORIGINS, allow_credentials=True, - allow_methods=["*"], - allow_headers=["*"], + allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"], + allow_headers=["Content-Type", "Authorization", "X-Request-ID"], )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.app.add_middleware( CORSMiddleware, allow_origins=settings.ALLOWED_ORIGINS, allow_credentials=True, allow_methods=["GET", "POST", "PUT", "DELETE", "OPTIONS"], allow_headers=["Content-Type", "Authorization", "X-Request-ID"], )
25-27: 🛠️ Refactor suggestion
Add secure cookie setting for production environments
The SessionMiddleware is using same_site="lax", which is good for security, but there's no explicit secure flag for HTTPS.
app.add_middleware( - SessionMiddleware, secret_key=settings.SESSION_SECRET_KEY, same_site="lax" + SessionMiddleware, + secret_key=settings.SESSION_SECRET_KEY, + same_site="lax", + https_only=settings.USE_HTTPS )This requires adding a
USE_HTTPS
setting in your configuration that is True in production environments.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.app.add_middleware( SessionMiddleware, secret_key=settings.SESSION_SECRET_KEY, same_site="lax", https_only=settings.USE_HTTPS )
app/api/router/health.py (1)
10-10: 🛠️ Refactor suggestion
Fix dependency injection to comply with B008 rule
The static analysis tool flagged using
Depends
directly in function parameters, which is not recommended.+db_dependency = Depends(get_db_session) + @health_check.get("/ping", tags=["health_check"], status_code=status.HTTP_200_OK) -def ping(db: Session = Depends(get_db_session)): +def ping(db: Session = db_dependency):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# module‐level dependency variable, so we don’t call Depends() in the function signature db_dependency = Depends(get_db_session) @health_check.get("/ping", tags=["health_check"], status_code=status.HTTP_200_OK) def ping(db: Session = db_dependency): # (function body unchanged) return {"status": "ok"}
🧰 Tools
🪛 Ruff (0.8.2)
10-10: Do not perform function call
Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
app/core/database.py (2)
18-24: 🛠️ Refactor suggestion
Missing error handling for database connection failures.
There's no error handling for potential database connection failures. This could lead to unhandled exceptions during application startup or runtime.
def __init__(self, db_url: str): if not hasattr(self, "engine"): - self.engine = create_engine( - db_url, connect_args={"check_same_thread": False} - ) - self.session = sessionmaker(autoflush=False, bind=self.engine) + try: + self.engine = create_engine( + db_url, connect_args={"check_same_thread": False} + ) + self.session = sessionmaker(autoflush=False, bind=self.engine) + except Exception as e: + # Log the error + raise RuntimeError(f"Failed to initialize database connection: {str(e)}") from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def __init__(self, db_url: str): if not hasattr(self, "engine"): try: self.engine = create_engine( db_url, connect_args={"check_same_thread": False} ) self.session = sessionmaker(autoflush=False, bind=self.engine) except Exception as e: # Log the error raise RuntimeError(f"Failed to initialize database connection: {str(e)}") from e
18-24: 🛠️ Refactor suggestion
SQLite-specific connection arguments may limit database flexibility.
The
connect_args={"check_same_thread": False}
parameter is SQLite-specific. This could cause issues if migrating to a different database engine in the future.def __init__(self, db_url: str): if not hasattr(self, "engine"): + connect_args = {} + if db_url.startswith('sqlite'): + connect_args["check_same_thread"] = False self.engine = create_engine( - db_url, connect_args={"check_same_thread": False} + db_url, connect_args=connect_args ) self.session = sessionmaker(autoflush=False, bind=self.engine)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def __init__(self, db_url: str): if not hasattr(self, "engine"): connect_args = {} if db_url.startswith("sqlite"): connect_args["check_same_thread"] = False self.engine = create_engine( db_url, connect_args=connect_args ) self.session = sessionmaker(autoflush=False, bind=self.engine)
app/core/exceptions.py (1)
23-28: 🛠️ Refactor suggestion
Add logging to the unhandled exception handler.
The unhandled exception handler masks all exceptions with a generic message without logging them. This could make debugging server-side issues difficult.
async def unhandled_exception_handler(request: Request, exc: Exception): request_id = getattr(request.state, "request_id", "") + # Log the exception with traceback for debugging + import logging + logging.exception(f"Unhandled exception occurred for request {request_id}: {str(exc)}") return JSONResponse( status_code=HTTP_500_INTERNAL_SERVER_ERROR, content={"detail": "Internal Server Error", "request_id": request_id}, )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def unhandled_exception_handler(request: Request, exc: Exception): request_id = getattr(request.state, "request_id", "") # Log the exception with traceback for debugging import logging logging.exception(f"Unhandled exception occurred for request {request_id}: {str(exc)}") return JSONResponse( status_code=HTTP_500_INTERNAL_SERVER_ERROR, content={"detail": "Internal Server Error", "request_id": request_id}, )
app/models/resume.py (2)
8-27: 🛠️ Refactor suggestion
Add created_at and updated_at timestamps to both models.
Both models lack timestamps for tracking when records were created or last updated. This is important for auditing, debugging, and implementing data retention policies.
+from sqlalchemy import Column, String, Integer, ForeignKey, Text, DateTime +from sqlalchemy.sql import func class ProcessedResume(Base): # ... existing code ... + created_at = Column(DateTime, server_default=func.now(), nullable=False) + updated_at = Column(DateTime, server_default=func.now(), onupdate=func.now(), nullable=False) class Resume(Base): # ... existing code ... + created_at = Column(DateTime, server_default=func.now(), nullable=False) + updated_at = Column(DateTime, server_default=func.now(), onupdate=func.now(), nullable=False)Also applies to: 29-36
8-27: 🛠️ Refactor suggestion
Add relationship between ProcessedResume and Resume models.
There's currently no relationship defined between the
ProcessedResume
andResume
models, even though they seem related by theirresume_id
fields. This makes it difficult to query the raw resume content for a processed resume.Consider adding a relationship:
class ProcessedResume(Base): # ... existing code ... owner_id = Column(Integer, ForeignKey("users.id"), nullable=False) owner = relationship("User", back_populates="processed_resumes") + + # Relationship to the original resume + raw_resume = relationship("Resume", foreign_keys=[resume_id], + primaryjoin="ProcessedResume.resume_id == Resume.resume_id", + uselist=False) jobs = relationship( "Job", secondary=job_resume_association, back_populates="processed_resumes" )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.class ProcessedResume(Base): __tablename__ = "processed_resumes" resume_id = Column(String, primary_key=True, index=True) # uuid field personal_data = Column(JSON, nullable=False) experiences = Column(JSON, nullable=True) projects = Column(JSON, nullable=True) skills = Column(JSON, nullable=True) research_work = Column(JSON, nullable=True) achievements = Column(JSON, nullable=True) education = Column(JSON, nullable=True) extracted_keywords = Column(JSON, nullable=True) owner_id = Column(Integer, ForeignKey("users.id"), nullable=False) owner = relationship("User", back_populates="processed_resumes") # Relationship to the original resume raw_resume = relationship( "Resume", foreign_keys=[resume_id], primaryjoin="ProcessedResume.resume_id == Resume.resume_id", uselist=False, ) jobs = relationship( "Job", secondary=job_resume_association, back_populates="processed_resumes" )
app/api/router/v1/resume.py (2)
16-17:
⚠️ Potential issueFix function calls in default arguments
The static analysis tool correctly flagged that using
File(...)
andDepends(get_db_session)
directly in parameter defaults can cause issues with reusing the same instances across requests.Apply this diff to resolve the issue:
-async def upload_resume( - request: Request, - file: UploadFile = File(...), - db: Session = Depends(get_db_session), -): +async def upload_resume( + request: Request, + file: UploadFile = File(...), + db: Session = Depends(get_db_session), +):This isn't actually a functional issue in FastAPI's case, as it handles these properly behind the scenes, but it's still good practice to follow the linter's recommendation for consistency.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
16-16: Do not perform function call
File
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
17-17: Do not perform function call
Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
50-51:
⚠️ Potential issueImprove exception handling
The exception is caught and re-raised without preserving the original traceback context.
Apply this fix to preserve the exception context:
- except Exception as e: - raise HTTPException(status_code=500, detail=f"Error processing file: {str(e)}") + except Exception as e: + raise HTTPException( + status_code=500, + detail=f"Error processing file: {str(e)}" + ) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.except Exception as e: raise HTTPException( status_code=500, detail=f"Error processing file: {str(e)}" ) from e
🧰 Tools
🪛 Ruff (0.8.2)
51-51: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
app/models/job.py (2)
20-20:
⚠️ Potential issueFix typo in column name
There is a typo in the column name "compensation_and_benfits" - missing an "e" in "benefits".
Apply this fix:
- compensation_and_benfits = Column(JSON, nullable=True) + compensation_and_benefits = Column(JSON, nullable=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.compensation_and_benefits = Column(JSON, nullable=True)
1-31: 🛠️ Refactor suggestion
Add timestamp columns for auditing
The model lacks timestamp columns for tracking creation and modification times.
Add timestamp columns:
from sqlalchemy import Column, String, Text, Date, Integer, ForeignKey +from sqlalchemy.sql import func from sqlalchemy.types import JSON from sqlalchemy.orm import relationship from .association import job_resume_association class Job(Base): __tablename__ = "jobs" job_id = Column(String, primary_key=True, index=True) # uuid field # ... other columns ... + created_at = Column(Date, server_default=func.now(), nullable=False) + updated_at = Column(Date, server_default=func.now(), onupdate=func.now(), nullable=False) # relationships ...app/services/resume_service.py (5)
64-66:
⚠️ Potential issueStore original filename in database
The
filename
parameter is passed to _store_resume_in_db but not used to store the original filename.Update the Resume model to include the original filename:
resume = Resume( - resume_id=resume_id, content=text_content, content_type=content_type + resume_id=resume_id, + content=text_content, + content_type=content_type, + original_filename=filename )Make sure the Resume model has this field. If not, add it to the model:
class Resume(Base): __tablename__ = "resumes" id = Column(Integer, primary_key=True, index=True) resume_id = Column(String, unique=True, nullable=False) content = Column(Text, nullable=False) content_type = Column(String, nullable=False) original_filename = Column(String, nullable=True) # Add this field
17-28:
⚠️ Potential issueFix incorrect return type in docstring
The docstring incorrectly states the function returns None, but it returns a resume_id string.
Fix the docstring to match the implementation:
Returns: - None + resume_id (str): UUID string of the stored resume record.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.""" Converts resume file (PDF/DOCX) to text using MarkItDown and stores it in the database. Args: file_bytes: Raw bytes of the uploaded file file_type: MIME type of the file ("application/pdf" or "application/vnd.openxmlformats-officedocument.wordprocessingml.document") filename: Original filename content_type: Output format ("md" for markdown or "html") Returns: resume_id (str): UUID string of the stored resume record. """
58-71: 🛠️ Refactor suggestion
Improve database transaction handling
The database operation doesn't handle transaction failures or implement retries.
Add error handling and transaction management:
def _store_resume_in_db( self, resume_id: str, filename: str, text_content: str, content_type: str ): """ Stores the parsed resume content in the database. """ resume = Resume( resume_id=resume_id, content=text_content, content_type=content_type ) - self.db.add(resume) - self.db.commit() + try: + self.db.add(resume) + self.db.commit() + except Exception as db_error: + self.db.rollback() + raise ValueError(f"Database operation failed: {str(db_error)}") return resume
29-34: 🛠️ Refactor suggestion
Ensure temp file is secure and limit size
The code creates a temporary file but doesn't enforce size limits, which could lead to disk space issues.
Add file size validation before writing to disk:
+ # Check file size (e.g., limit to 10MB) + MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB + if len(file_bytes) > MAX_FILE_SIZE: + raise ValueError(f"File size exceeds the maximum limit of {MAX_FILE_SIZE} bytes") + with tempfile.NamedTemporaryFile( delete=False, suffix=self._get_file_extension(file_type) ) as temp_file:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Check file size (e.g., limit to 10MB) MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB if len(file_bytes) > MAX_FILE_SIZE: raise ValueError(f"File size exceeds the maximum limit of {MAX_FILE_SIZE} bytes") with tempfile.NamedTemporaryFile( delete=False, suffix=self._get_file_extension(file_type) ) as temp_file: temp_file.write(file_bytes) temp_path = temp_file.name
35-41: 🛠️ Refactor suggestion
Add more robust error handling for conversion
The conversion process doesn't handle specific exceptions from MarkItDown, which could make debugging harder.
Improve error handling to catch specific exceptions:
try: - result = self.md.convert(temp_path) - text_content = result.text_content + try: + result = self.md.convert(temp_path) + text_content = result.text_content + if not text_content or text_content.strip() == "": + raise ValueError("Conversion resulted in empty content") + except Exception as conversion_error: + raise ValueError(f"Failed to convert document: {str(conversion_error)}") + resume_id = str(uuid.uuid4()) self._store_resume_in_db(resume_id, filename, text_content, content_type) return resume_id📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.try: try: result = self.md.convert(temp_path) text_content = result.text_content if not text_content or text_content.strip() == "": raise ValueError("Conversion resulted in empty content") except Exception as conversion_error: raise ValueError(f"Failed to convert document: {str(conversion_error)}") resume_id = str(uuid.uuid4()) self._store_resume_in_db(resume_id, filename, text_content, content_type) return resume_id
Pull Request Title
Related Issue
#332
Description
Type
Proposed Changes
Screenshots / Code Snippets (if applicable)
How to Test
Checklist
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes
Chores