You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Implement new Model API for defining model architectures.
- Refactor code, update style, and revise unit tests.
- Update documentation with references to PLOS Computational Biology (https://doi.org/10.1371/journal.pcbi.1006106).


**RIDDLE** (**R**ace and ethnicity **I**mputation from **D**isease history with **D**eep **LE**arning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in ["RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning"](https://arxiv.org/abs/1707.01623) (arXiv preprint, 2017).
5
+
**RIDDLE** (**R**ace and ethnicity **I**mputation from **D**isease history with **D**eep **LE**arning) is an open-source Python2 library for using deep learning to impute race and ethnicity information in anonymized electronic medical records (EMRs). RIDDLE provides the ability to (1) build models for estimating race and ethnicity from clinical features, and (2) interpret trained models to describe how specific features contribute to predictions. The RIDDLE library implements the methods introduced in ["RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning"](https://doi.org/10.1371/journal.pcbi.1006106) (PLOS Computational Biology, 2018).
6
6
7
7
Compared to alternative methods (e.g., scikit-learn/Python, glm/R), RIDDLE is designed to handle large and high-dimensional datasets in a performant fashion. RIDDLE trains models efficiently by running on a parallelized TensorFlow/Theano backend, and avoids memory overflow by preprocessing data in conjunction with batch-wise training.
8
8
9
-
RIDDLE uses [Keras](https://keras.io) to specify and train the underlying deep neural networks, and [DeepLIFT](https://github.com/kundajelab/deeplift) to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own `model_module` files!
9
+
RIDDLE uses [Keras](https://keras.io) to specify and train the underlying deep neural networks, and [DeepLIFT](https://github.com/kundajelab/deeplift) to compute feature-to-class contribution scores. The current RIDDLE Python module works with both TensorFlow and Theno as the backend to Keras. The default architecture is a deep multi-layer perceptron (deep MLP) that takes binary-encoded features and targets. However, you can specify any neural network architecture (e.g., LSTM, CNN) and data format by writing your own `model_module` files!
10
10
11
11
### Documentation
12
12
Please visit [riddle.ai](https://riddle.ai).
13
13
14
-
### Dependencies
15
-
Python Libraries:
16
-
* Keras (`keras`)
14
+
### Dependencies
15
+
Python Libraries:
16
+
* Keras (`keras`)
17
17
* DeepLIFT (`deeplift`, install from GitHub)
18
-
* TensorFlow (`tensorflow`) or Theano (`theano`)
19
-
* scikit-learn (`sklearn`)
20
-
* NumPy (`numpy`)
21
-
* SciPy (`scipy`)
22
-
* Matplotlib (`matplotlib`)
23
-
* h5py (`h5py`)
24
-
25
-
General:
18
+
* TensorFlow (`tensorflow`) or Theano (`theano`)
19
+
* scikit-learn (`sklearn`)
20
+
* NumPy (`numpy`)
21
+
* SciPy (`scipy`)
22
+
* Matplotlib (`matplotlib`)
23
+
* h5py (`h5py`)
24
+
25
+
General:
26
26
* HDF5
27
27
28
28
### Unit testing
@@ -46,29 +46,50 @@ Alternatively, you can install RIDDLE and DeepLIFT from GitHub using `pip`:
% python parameter_search.py # run parameter tuning
54
+
% python riddle.py # train and evaluate the model
55
+
% python interpret_riddle.py # interpret the traiend model
56
+
```
57
+
49
58
#### What is the default format for data files?
50
59
51
60
Please refer to the example data file `dummy.txt` and the accompanying `README` in the [`_data` directory](https://github.com/jisungk/riddle/tree/master/_data).
52
61
53
62
### Authors
54
63
55
-
[Ji-Sung Kim](http://jisungkim.com)
56
-
Princeton University
64
+
[Ji-Sung Kim](http://jisungkim.com)
65
+
Princeton University
57
66
*hello (at) jisungkim.com*
58
67
59
-
[Andrey Rzhetsky](https://scholar.google.com/citations?user=HXCMYLsAAAAJ&hl=en), Edna K. Papazian Professor
60
-
University of Chicago
68
+
[Xin Gao](https://scholar.google.com/citations?user=wqdK8ugAAAAJ&hl=en), Associate Professor
69
+
King Abdullah University of Science and Technology
70
+
*xin.gao (at) kaust.edu.sa*
71
+
72
+
[Andrey Rzhetsky](https://scholar.google.com/citations?user=HXCMYLsAAAAJ&hl=en), Edna K. Papazian Professor
73
+
University of Chicago
61
74
*andrey.rzhetsky (at) uchicago.edu*
62
75
63
76
### License & Attribution
64
-
All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).
77
+
All media (including but not limited to designs, images and logos) are copyrighted by Ji-Sung Kim (2017).
78
+
79
+
Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the [notice](https://github.com/jisungk/riddle/NOTICE) and [license](https://github.com/jisungk/riddle/LICENSE) files, and cite:
65
80
66
-
Project code (explicitly excluding media) is licensed under the Apache License 2.0. If you would like use or modify this project or any code presented here, please include the [notice](https://github.com/jisungk/riddle/NOTICE) and [license](https://github.com/jisungk/riddle/LICENSE) files, and cite:
67
81
```
68
-
@article{KimJS2017RIDDLE,
69
-
title={RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning},
70
-
author={Kim, Ji-Sung and Rzhetsky, Andrey},
71
-
journal={arXiv preprint arXiv:1707.01623},
72
-
year={2017}
82
+
@article{10.1371/journal.pcbi.1006106,
83
+
author = {Kim, Ji-Sung AND Gao, Xin AND Rzhetsky, Andrey},
84
+
journal = {PLOS Computational Biology},
85
+
publisher = {Public Library of Science},
86
+
title = {RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning},
0 commit comments