Skip to content

Latest commit

 

History

History

03_bonus_imdb-classification

Additional Experiments Classifying the Sentiment of 50k IMDB Movie Reviews

Overview

This folder contains additional experiments to compare the (decoder-style) GPT-2 (2018) model from chapter 6 to encoder-style LLMs like BERT (2018), RoBERTa (2019), and ModernBERT (2024). Instead of using the small SPAM dataset from Chapter 6, we are using the 50k movie review dataset from IMDb (dataset source) with a binary classification objective, predicting whether a reviewer liked the movie or not. This is a balanced dataset, so a random prediction should yield 50% accuracy.

Model Test accuracy
1 124 M GPT-2 Baseline 91.88%
2 340 M BERT 90.89%
3 66 M DistilBERT 91.40%
4 355 M RoBERTa 92.95%
5 149 M ModernBERT Base 93.79%
6 395 M ModernBERT Large 95.07%
7 Logistic Regression Baseline 88.85%

 

Step 1: Install Dependencies

Install the extra dependencies via

pip install -r requirements-extra.txt

 

Step 2: Download Dataset

The codes are using the 50k movie reviews from IMDb (dataset source) to predict whether a movie review is positive or negative.

Run the following code to create the train.csv, validation.csv, and test.csv datasets:

python download_prepare_dataset.py

 

Step 3: Run Models

 

1) 124 M GPT-2 Baseline

The 124M GPT-2 model used in chapter 6, starting with pretrained weights, and finetuning all weights:

python train_gpt.py --trainable_layers "all" --num_epochs 1
Ep 1 (Step 000000): Train loss 3.706, Val loss 3.853
Ep 1 (Step 000050): Train loss 0.682, Val loss 0.706
...
Ep 1 (Step 004300): Train loss 0.199, Val loss 0.285
Ep 1 (Step 004350): Train loss 0.188, Val loss 0.208
Training accuracy: 95.62% | Validation accuracy: 95.00%
Training completed in 9.48 minutes.

Evaluating on the full datasets ...

Training accuracy: 95.64%
Validation accuracy: 92.32%
Test accuracy: 91.88%



 

2) 340 M BERT

A 340M parameter encoder-style BERT model:

python train_bert_hf.py --trainable_layers "all" --num_epochs 1 --model "bert"
Ep 1 (Step 000000): Train loss 0.848, Val loss 0.775
Ep 1 (Step 000050): Train loss 0.655, Val loss 0.682
...
Ep 1 (Step 004300): Train loss 0.146, Val loss 0.318
Ep 1 (Step 004350): Train loss 0.204, Val loss 0.217
Training accuracy: 92.50% | Validation accuracy: 88.75%
Training completed in 7.65 minutes.

Evaluating on the full datasets ...

Training accuracy: 94.35%
Validation accuracy: 90.74%
Test accuracy: 90.89%



 

3) 66 M DistilBERT

A 66M parameter encoder-style DistilBERT model (distilled down from a 340M parameter BERT model), starting for the pretrained weights and only training the last transformer block plus output layers:

python train_bert_hf.py --trainable_layers "all" --num_epochs 1 --model "distilbert"
Ep 1 (Step 000000): Train loss 0.693, Val loss 0.688
Ep 1 (Step 000050): Train loss 0.452, Val loss 0.460
...
Ep 1 (Step 004300): Train loss 0.179, Val loss 0.272
Ep 1 (Step 004350): Train loss 0.199, Val loss 0.182
Training accuracy: 95.62% | Validation accuracy: 91.25%
Training completed in 4.26 minutes.

Evaluating on the full datasets ...

Training accuracy: 95.30%
Validation accuracy: 91.12%
Test accuracy: 91.40%



 

4) 355 M RoBERTa

A 355M parameter encoder-style RoBERTa model, starting for the pretrained weights and only training the last transformer block plus output layers:

python train_bert_hf.py --trainable_layers "last_block" --num_epochs 1 --model "roberta" 
Ep 1 (Step 000000): Train loss 0.695, Val loss 0.698
Ep 1 (Step 000050): Train loss 0.670, Val loss 0.690
...
Ep 1 (Step 004300): Train loss 0.126, Val loss 0.149
Ep 1 (Step 004350): Train loss 0.211, Val loss 0.138
Training accuracy: 92.50% | Validation accuracy: 94.38%
Training completed in 7.20 minutes.

Evaluating on the full datasets ...

Training accuracy: 93.44%
Validation accuracy: 93.02%
Test accuracy: 92.95%



 

5) 149 M ModernBERT Base

ModernBERT (2024) is an optimized reimplementation of BERT that incorporates architectural improvements like parallel residual connections and gated linear units (GLUs) to boost efficiency and performance. It maintains BERT’s original pretraining objectives while achieving faster inference and better scalability on modern hardware.

Ep 1 (Step 000000): Train loss 0.699, Val loss 0.698
Ep 1 (Step 000050): Train loss 0.564, Val loss 0.606
...
Ep 1 (Step 004300): Train loss 0.086, Val loss 0.168
Ep 1 (Step 004350): Train loss 0.160, Val loss 0.131
Training accuracy: 95.62% | Validation accuracy: 93.75%
Training completed in 10.27 minutes.

Evaluating on the full datasets ...

Training accuracy: 95.72%
Validation accuracy: 94.00%
Test accuracy: 93.79%



 

6) 395 M ModernBERT Large

Same as above but using the larger ModernBERT variant.

Ep 1 (Step 000000): Train loss 0.666, Val loss 0.662
Ep 1 (Step 000050): Train loss 0.548, Val loss 0.556
...
Ep 1 (Step 004300): Train loss 0.083, Val loss 0.115
Ep 1 (Step 004350): Train loss 0.154, Val loss 0.116
Training accuracy: 96.88% | Validation accuracy: 95.62%
Training completed in 27.69 minutes.

Evaluating on the full datasets ...

Training accuracy: 97.04%
Validation accuracy: 95.30%
Test accuracy: 95.07%



 

7) Logistic Regression Baseline

A scikit-learn logistic regression classifier as a baseline:

python train_sklearn_logreg.py
Dummy classifier:
Training Accuracy: 50.01%
Validation Accuracy: 50.14%
Test Accuracy: 49.91%


Logistic regression classifier:
Training Accuracy: 99.80%
Validation Accuracy: 88.62%
Test Accuracy: 88.85%