Skip to content

Commit 92eaafd

Browse files
committed
Release V2 code
- Implement new Model API for defining model architectures. - Refactor code, update style, and revise unit tests. - Update documentation with references to PLOS Computational Biology (https://doi.org/10.1371/journal.pcbi.1006106).
1 parent c3dd4ce commit 92eaafd

36 files changed

+3555
-3089
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ _secret/
44
*out*/
55
*.png
66
venv/
7+
*TODO*
8+
*.pkl
79

810
# Batch scripts / output
911
*.sbatch

LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@
186186
same "printed page" as the copyright notice for easier
187187
identification within third-party archives.
188188

189-
Copyright {yyyy} {name of copyright owner}
189+
Copyright 2018 Ji-Sung Kim
190190

191191
Licensed under the Apache License, Version 2.0 (the "License");
192192
you may not use this file except in compliance with the License.

NOTICE

+8-8
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
###----------------------------------------------------------------###
22

33
If this notice is included in a distribution, then the distribution
4-
is lawfully acknowledging that it uses, changes, or adapts code
4+
is lawfully acknowledging that it uses, changes, or adapts code
55
from RIDDLE.
66

7-
RIDDLE is an open-source Python library for using deep learning to
7+
RIDDLE is an open-source Python library for using deep learning to
88
impute race and ethnicity information in anonymized electronic medical
9-
records (EMRs). RIDDLE was developed as part of research funded by the
10-
DARPA Big Mechanism program and a gift from Liz and Kent Dauten.
9+
records (EMRs). RIDDLE was developed as part of research funded by the
10+
DARPA Big Mechanism program and a gift from Liz and Kent Dauten.
1111

12-
RIDDLE implements the methods introduced in "RIDDLE: Race and
13-
ethnicity Imputation from Disease history with Deep LEarning"
14-
by Ji-Sung Kim and Andrey Rzhetsly.
12+
RIDDLE implements methods from "RIDDLE: Race and ethnicity Imputation
13+
from Disease history with Deep LEarning" by Ji-Sung Kim, Xin Gao,
14+
and Andrey Rzhetsky (https://doi.org/10.1371/journal.pcbi.1006106).
1515

16-
More information about RIDDLE can be found at https://riddle.ai
16+
More information about RIDDLE can be found at https://riddle.ai.
1717

1818
Original Authors: Ji-Sung Kim, Andrey Rzhetsky
1919
Date of release: July 2017

README.md

+48-27
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
1-
![RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning (RIDDLE)](https://user-images.githubusercontent.com/9053987/27894953-4aff74e6-61c4-11e7-901a-8a459026b4ee.png)
2-
[![Build Status](https://travis-ci.org/jisungk/RIDDLE.svg?branch=master)](https://travis-ci.org/jisungk/RIDDLE)
1+
![RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning (RIDDLE)](https://user-images.githubusercontent.com/9053987/27894953-4aff74e6-61c4-11e7-901a-8a459026b4ee.png)
2+
[![Build Status](https://travis-ci.org/jisungk/RIDDLE.svg?branch=master)](https://travis-ci.org/jisungk/RIDDLE)
33
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/jisungk/riddle/blob/master/LICENSE)
44

5-
**RIDDLE** (**R**ace and ethnicity **I**mputation from **D**isease history with **D**eep **LE**arning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in ["RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning"](https://arxiv.org/abs/1707.01623) (arXiv preprint, 2017).
5+
**RIDDLE** (**R**ace and ethnicity **I**mputation from **D**isease history with **D**eep **LE**arning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in ["RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning"](https://doi.org/10.1371/journal.pcbi.1006106) (PLOS Computational Biology, 2018).
66

77
Compared to alternative methods (e.g., scikit-learn/Python, glm/R), RIDDLE is designed to handle large and high-dimensional datasets in a performant fashion. RIDDLE trains models efficiently by running on a parallelized TensorFlow/Theano backend, and avoids memory overflow by preprocessing data in conjunction with batch-wise training.
88

9-
RIDDLE uses [Keras](https://keras.io) to specify and train the underlying deep neural networks, and [DeepLIFT](https://github.com/kundajelab/deeplift) to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own `model_module` files!
9+
RIDDLE uses [Keras](https://keras.io) to specify and train the underlying deep neural networks, and [DeepLIFT](https://github.com/kundajelab/deeplift) to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own `model_module` files!
1010

1111
### Documentation
1212
Please visit [riddle.ai](https://riddle.ai).
1313

14-
### Dependencies
15-
Python Libraries:
16-
* Keras (`keras`)
14+
### Dependencies
15+
Python Libraries:
16+
* Keras (`keras`)
1717
* DeepLIFT (`deeplift`, install from GitHub)
18-
* TensorFlow (`tensorflow`) or Theano (`theano`)
19-
* scikit-learn (`sklearn`)
20-
* NumPy (`numpy`)
21-
* SciPy (`scipy`)
22-
* Matplotlib (`matplotlib`)
23-
* h5py (`h5py`)
24-
25-
General:
18+
* TensorFlow (`tensorflow`) or Theano (`theano`)
19+
* scikit-learn (`sklearn`)
20+
* NumPy (`numpy`)
21+
* SciPy (`scipy`)
22+
* Matplotlib (`matplotlib`)
23+
* h5py (`h5py`)
24+
25+
General:
2626
* HDF5
2727

2828
### Unit testing
@@ -46,29 +46,50 @@ Alternatively, you can install RIDDLE and DeepLIFT from GitHub using `pip`:
4646
% pip install git+git://github.com/jisungk/riddle.git # RIDDLE
4747
```
4848

49+
#### How can I run the RIDDLE pipeline?
50+
51+
Execute the following scripts.
52+
```
53+
% python parameter_search.py # run parameter tuning
54+
% python riddle.py # train and evaluate the model
55+
% python interpret_riddle.py # interpret the traiend model
56+
```
57+
4958
#### What is the default format for data files?
5059

5160
Please refer to the example data file `dummy.txt` and the accompanying `README` in the [`_data` directory](https://github.com/jisungk/riddle/tree/master/_data).
5261

5362
### Authors
5463

55-
[Ji-Sung Kim](http://jisungkim.com)
56-
Princeton University
64+
[Ji-Sung Kim](http://jisungkim.com)
65+
Princeton University
5766
*hello (at) jisungkim.com*
5867

59-
[Andrey Rzhetsky](https://scholar.google.com/citations?user=HXCMYLsAAAAJ&hl=en), Edna K. Papazian Professor
60-
University of Chicago
68+
[Xin Gao](https://scholar.google.com/citations?user=wqdK8ugAAAAJ&hl=en), Associate Professor
69+
King Abdullah University of Science and Technology
70+
*xin.gao (at) kaust.edu.sa*
71+
72+
[Andrey Rzhetsky](https://scholar.google.com/citations?user=HXCMYLsAAAAJ&hl=en), Edna K. Papazian Professor
73+
University of Chicago
6174
*andrey.rzhetsky (at) uchicago.edu*
6275

6376
### License & Attribution
64-
All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).
77+
All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).
78+
79+
Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the [notice](https://github.com/jisungk/riddle/NOTICE) and [license](https://github.com/jisungk/riddle/LICENSE) files, and cite:
6580

66-
Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the [notice](https://github.com/jisungk/riddle/NOTICE) and [license](https://github.com/jisungk/riddle/LICENSE) files, and cite:
6781
```
68-
@article{KimJS2017RIDDLE,
69-
title={RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning},
70-
author={Kim, Ji-Sung and Rzhetsky, Andrey},
71-
journal={arXiv preprint arXiv:1707.01623},
72-
year={2017}
82+
@article{10.1371/journal.pcbi.1006106,
83+
author = {Kim, Ji-Sung AND Gao, Xin AND Rzhetsky, Andrey},
84+
journal = {PLOS Computational Biology},
85+
publisher = {Public Library of Science},
86+
title = {RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning},
87+
year = {2018},
88+
month = {04},
89+
volume = {14},
90+
url = {https://doi.org/10.1371/journal.pcbi.1006106},
91+
pages = {1-15},
92+
number = {4},
93+
doi = {10.1371/journal.pcbi.1006106}
7394
}
74-
```
95+
```

0 commit comments

Comments
 (0)