Skip to content

A project for comparing different LLM models' SQL generation capabilities with real-time metrics evaluation

Notifications You must be signed in to change notification settings

hemanthrayuduu/llm-comparison-sql-metrics

Repository files navigation

SQL-LLM Comparison Tool

A React application for comparing different LLM models' SQL generation capabilities with comprehensive metrics visualization.

Live Demo: https://llm-comparison-sql-metrics.onrender.com/

Model Fine-tuning

A key aspect of this project is the fine-tuning of language models for SQL generation. Both GPT-3.5 Turbo and GPT-4o Mini models were fine-tuned using the Low-Rank Adaptation (LoRA) technique to enhance their SQL generation capabilities.

Fine-tuning Process

  • Dataset: Models were fine-tuned using the Gretel AI Synthetic Text-to-SQL dataset
  • Technique: Low-Rank Adaptation (LoRA) was employed, which is memory-efficient and trains only a small set of weights while freezing the pre-trained model parameters
  • Training Focus: The models were specifically optimized for:
    • Improved schema understanding
    • Better translation of natural language to SQL syntax
    • Enhanced handling of complex queries
    • More efficient and optimized SQL generation

Performance Improvements

Fine-tuning resulted in significant improvements across multiple metrics:

  • Higher SQL quality scores
  • Better execution accuracy
  • Improved mathematical accuracy in aggregations
  • Enhanced query efficiency
  • Reduced response times

The comparison tool allows you to directly compare the base models with their fine-tuned versions to observe these improvements.

Features

  • Compare four LLM models side-by-side: GPT-3.5 Turbo, GPT-3.5 Turbo (Fine-tuned), GPT-4o Mini, and GPT-4o Mini (Fine-tuned)
  • Interactive performance metrics bar chart for visual comparison
  • Detailed metrics table with highlighted best performers
  • Support for database schema input to improve context for SQL generation
  • Real-time SQL quality evaluation with multiple performance metrics
  • Modern, responsive UI with color-coded model indicators

Performance Metrics

The application evaluates SQL generation on multiple dimensions, including:

  • SQL Quality Score: Overall quality assessment of generated SQL
  • Execution Accuracy: Measures how accurately the SQL query can be executed
  • Math Accuracy: Precision in calculations and numerical operations
  • Efficiency Score: Evaluation of query optimization and performance
  • Response Time: Time taken to generate the SQL response

Setup

Prerequisites

  • Node.js 16+
  • npm or yarn
  • OpenAI API key with access to GPT models

Installation

  1. Clone the repository
  2. Install dependencies:
    npm install
    
  3. Create a .env file with your OpenAI API key:
    VITE_OPENAI_API_KEY=your_openai_api_key
    
  4. Start the development server:
    npm run dev
    

Environment Variables

Required environment variables:

VITE_OPENAI_API_KEY=your_openai_api_key

Deployment

The application can be deployed using Docker:

  1. Build the Docker image:

    docker build -t sql-llm-comparison-tool .
    
  2. Run the container with your API key:

    docker run -p 8080:80 -e VITE_OPENAI_API_KEY=your_openai_api_key sql-llm-comparison-tool
    

Usage

  1. Enter a natural language query in the input field or select a sample query
  2. Optionally provide a database schema to improve context
  3. Click "Compare Models" to generate SQL from all four models
  4. View the detailed responses and the SQL generated by each model
  5. Compare performance metrics in the visualization section
  6. Identify the best performing model for your specific query needs

Project Structure

  • src/components: React components for the UI
  • src/services/api.ts: API integration with OpenAI
  • src/data/sampleQueries.ts: Pre-defined sample queries
  • src/utils: Utility functions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

About

A project for comparing different LLM models' SQL generation capabilities with real-time metrics evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published