Skip to content

MSR sentence completion challenge implementation (RNNLM, LSA total similarity, word2vec total similarity) in Python

Notifications You must be signed in to change notification settings

PR-Iyyer/sentence-completion

 
 

Repository files navigation

sentence-completion

These are Python3/Tensorflow implementations for MSR Sentence Completion Challenge.

Code of RNN language model borrows heaveily from word-rnn-tensorflow.

Requirements

  • Python3
  • Numpy
  • Tensorflow 1.0
  • NLTK 3.2.1
  • SciPy
  • scikit-learn
  • gensim

Data

Training and Test data set can be downloaded from the following link:
https://drive.google.com/open?id=0B5eGOMdyHn2mWDYtQzlQeGNKa2s
Google's pretrained word vectors can be downloaded here: GoogleNews-vectors-negative300.bin.gz

Please extract the training data and store them inside the ./data directory.

Overview of Models

Recurrent Neural Netowrk Language Model (RNNLM)

  • utils.py
  • model.py
  • train.py
  • inference.py

Total Word Similarity with Latent Semantic Analysis (LSA Total Simlilarity )

  • lsa_similariy.py

Total Word Similarity with Google's pretrained word vectors (Word2vec Total Similarity)

  • word2vec_similarity.py

I recommend to look at Platt's Computational Approaches to Sentence Completion paper.

Basic Usage

To train the RNNLM model with default parameters, run:

python3 train.py

To generate a csv file of predictions from the latest saved checkpoint:

python3 inference.py

Train and output predictions using the LSA Total Similarity model:

python3 lsa_simlarity.py

Train and output predictions using the Word2vec Total Similarity model:

python3 word2vec_similarity.py

Calculate the average precision of predictions:

python3 acc.py -i [path_to_prediction_file]

Pretrained Model

Generate predictions of the test set using pretrained RNN model:

bash ./run.sh

Performance

Method Test
RNNLM 0.475
LSA Total Similarity 0.449
Word2vec Total Similarity 0.363

About

MSR sentence completion challenge implementation (RNNLM, LSA total similarity, word2vec total similarity) in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.5%
  • Shell 0.5%