A React application for comparing different LLM models' SQL generation capabilities with comprehensive metrics visualization.
Live Demo: https://llm-comparison-sql-metrics.onrender.com/
A key aspect of this project is the fine-tuning of language models for SQL generation. Both GPT-3.5 Turbo and GPT-4o Mini models were fine-tuned using the Low-Rank Adaptation (LoRA) technique to enhance their SQL generation capabilities.
- Dataset: Models were fine-tuned using the Gretel AI Synthetic Text-to-SQL dataset
- Technique: Low-Rank Adaptation (LoRA) was employed, which is memory-efficient and trains only a small set of weights while freezing the pre-trained model parameters
- Training Focus: The models were specifically optimized for:
- Improved schema understanding
- Better translation of natural language to SQL syntax
- Enhanced handling of complex queries
- More efficient and optimized SQL generation
Fine-tuning resulted in significant improvements across multiple metrics:
- Higher SQL quality scores
- Better execution accuracy
- Improved mathematical accuracy in aggregations
- Enhanced query efficiency
- Reduced response times
The comparison tool allows you to directly compare the base models with their fine-tuned versions to observe these improvements.
- Compare four LLM models side-by-side: GPT-3.5 Turbo, GPT-3.5 Turbo (Fine-tuned), GPT-4o Mini, and GPT-4o Mini (Fine-tuned)
- Interactive performance metrics bar chart for visual comparison
- Detailed metrics table with highlighted best performers
- Support for database schema input to improve context for SQL generation
- Real-time SQL quality evaluation with multiple performance metrics
- Modern, responsive UI with color-coded model indicators
The application evaluates SQL generation on multiple dimensions, including:
- SQL Quality Score: Overall quality assessment of generated SQL
- Execution Accuracy: Measures how accurately the SQL query can be executed
- Math Accuracy: Precision in calculations and numerical operations
- Efficiency Score: Evaluation of query optimization and performance
- Response Time: Time taken to generate the SQL response
- Node.js 16+
- npm or yarn
- OpenAI API key with access to GPT models
- Clone the repository
- Install dependencies:
npm install
- Create a
.env
file with your OpenAI API key:VITE_OPENAI_API_KEY=your_openai_api_key
- Start the development server:
npm run dev
Required environment variables:
VITE_OPENAI_API_KEY=your_openai_api_key
The application can be deployed using Docker:
-
Build the Docker image:
docker build -t sql-llm-comparison-tool .
-
Run the container with your API key:
docker run -p 8080:80 -e VITE_OPENAI_API_KEY=your_openai_api_key sql-llm-comparison-tool
- Enter a natural language query in the input field or select a sample query
- Optionally provide a database schema to improve context
- Click "Compare Models" to generate SQL from all four models
- View the detailed responses and the SQL generated by each model
- Compare performance metrics in the visualization section
- Identify the best performing model for your specific query needs
src/components
: React components for the UIsrc/services/api.ts
: API integration with OpenAIsrc/data/sampleQueries.ts
: Pre-defined sample queriessrc/utils
: Utility functions
Contributions are welcome! Please feel free to submit a Pull Request.
MIT