You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: data/wbm/readme.md
+3-3
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@ The authors performed 5 rounds of elemental substitution in total, each time rel
8
8
9
9
Since repeated substitutions should - on average - increase chemical dissimilarity, the 5 iterations of this data-generation process are a unique and compelling feature as it allows out-of distribution testing. We can check how model performance degrades when asked to predict on structures increasingly more dissimilar from the training set (which is restricted to the MP 2022 database release (or earlier) for all models in this benchmark).
10
10
11
-
## About the IDs
11
+
## 🆔  About the IDs
12
12
13
13
The first integer in each material ID ranging from 1 to 5 and coming right after the prefix `wbm-` indicates the substitution step, i.e. in which iteration of the substitution process was this material generated. Each iteration has varying numbers of materials which are counted by the 2nd integer. Note this 2nd number is not always consecutive. A small number of materials (~0.2%) were removed by the data processing steps detailed below. Don't be surprised to find an ID like `wbm-3-70804` followed by `wbm-3-70807`.
14
14
15
-
## Data processing steps
15
+
## 🪓  Data processing steps
16
16
17
17
The full set of processing steps used to curate the WBM test set from the raw data files (downloaded from URLs listed below) can be found in [`data/wbm/fetch_process_wbm_dataset.py`](https://github.com/janosh/matbench-discovery/blob/site/data/wbm/fetch_process_wbm_dataset.py). Processing involved
18
18
@@ -42,7 +42,7 @@ The number of materials in each step before and after processing are:
Copy file name to clipboardExpand all lines: site/src/routes/how-to-contribute/+page.md
+35-29
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
-
## Installation
1
+
## 🔨  Installation
2
2
3
3
The recommended way to acquire the train and test data for this benchmark is through its Python package [available onPyPI](https://pypi.org/project/matbench-discovery):
4
4
5
5
```zsh
6
6
pip install matbench-discovery
7
7
```
8
8
9
-
## Usage
9
+
## 📙  Usage
10
10
11
11
Here's an example script of how to download the training and test set files for training a new model, recording the results and submitting them via pull request to this benchmark:
12
12
@@ -66,7 +66,7 @@ assert list(df_wbm) == [
66
66
1.`e_above_hull_mp2020_corrected_ppd_mp`: Energy above hull distances in eV/atom after applying the MP2020 correction scheme and with respect to the Materials Project convex hull. Matbench Discovery takes these as ground truth for material stability. Any value above 0 is assumed to be an unstable/metastable material.
67
67
<!-- TODO document remaining columns, or maybe drop them from df -->
68
68
69
-
## Direct Download
69
+
## 📥  Direct Download
70
70
71
71
You can also download the data files directly from GitHub:
72
72
@@ -80,58 +80,64 @@ You can also download the data files directly from GitHub:
To add a new model to this benchmark, please create a pull request to the `main` branch of <https://github.com/janosh/matbench-discovery> that includes at least these 3 required files:
85
+
To deploy a new model on this benchmark and add it to our leaderboard, please create a pull request to the `main` branch of <https://github.com/janosh/matbench-discovery> that includes at least these 3 required files:
86
86
87
-
1.`<yyyy-mm-dd>-<model-name>-preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. Recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('<yyyy-mm-dd>-<model-name>-preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts).
88
-
1.`test_<model-name>.(py|ipynb)`: The Python script or Jupyter notebook used to generate the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model-name>.(py|ipynb)`.
89
-
1.`metadata.yml`: A file to record all relevant metadata your algorithm like model name, authors (can be different for the model and the PR), package requirements, relevant citations/links to publications and other info about the model. Here's a template:
87
+
1.`<yyyy-mm-dd>-<model_name>-preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. Recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('<yyyy-mm-dd>-<model_name>-preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts).
88
+
1.`test_<model_name>.(py|ipynb)`: The Python script or Jupyter notebook used to generate the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model_name>.(py|ipynb)`.
89
+
1.`metadata.yml`: A file to record all relevant metadata your algorithm like model nameand version, authors, package requirements, relevant citations/links to publications, notes, etc. Here's a template:
90
90
91
91
```yml
92
-
model_name: My cool foundational model v1
93
-
authors:
94
-
- family-names: Doe
95
-
given-names: John
92
+
# metadata.yml template
93
+
model_name: My cool foundational model # required
94
+
model_version: 1.0.0 # required
95
+
matbench_discovery_version: 1.0# required
96
+
date_added: 2023-01-01# required
97
+
authors: # required (only name, other keys are optional)
Optional free form multi-line notes that might help others reproduce your results.
120
+
Optional free form multi-line notes that can help others reproduce your results.
117
121
```
118
122
119
-
Only the keys `model_name`, `authors`, `repo`, `version` are required. Arbitrary other keys can be added as needed.
123
+
Arbitrary other keys can be added as needed.
120
124
121
125
Please see any of subdirectories in [`models/`](https://github.com/janosh/matbench-discovery/tree/main/models) for example submissions. More detailed step-by-step instructions below:
### Step 2: Commit model preds, script and metadata
130
136
131
137
Create a new folder
132
138
133
139
```sh
134
-
mkdir models/<model-name>
140
+
mkdir models/<model_name>
135
141
```
136
142
137
143
and place the above listed files there. The file structure should look like this:
@@ -141,21 +147,21 @@ matbench-discovery-root
141
147
└── models
142
148
└── <model name>
143
149
├── metadata.yml
144
-
├── <yyyy-mm-dd>-<model-name>-preds.(json|csv).gz
145
-
├── test_<model-name>.py
150
+
├── <yyyy-mm-dd>-<model_name>-preds.(json|csv).gz
151
+
├── test_<model_name>.py
146
152
├── readme.md # optional
147
-
└── train_<model-name>.py # optional
153
+
└── train_<model_name>.py # optional
148
154
```
149
155
150
-
You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or might help others reproduce your results. For larger files, please upload to Figshare or similar and link them somewhere in your files.
156
+
You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or help others reproduce your results. For larger files, please upload to [Figshare](https://figshare.com) or similar and link them somewhere in your files.
151
157
152
158
### Step 3: Create a PR to the [Matbench Discovery repo](https://github.com/janosh/matbench-discovery)
153
159
154
-
Commit your files to the repo on a branch called `<model-name>` and create a pull request (PR) to the Matbench repository.
160
+
Commit your files to the repo on a branch called `<model_name>` and create a pull request (PR) to the Matbench repository.
155
161
156
162
```sh
157
-
git add -a models/<model-name>
158
-
git commit -m 'add <model-name> to Matbench Discovery leaderboard`
163
+
git add -a models/<model_name>
164
+
git commit -m 'add <model_name> to Matbench Discovery leaderboard'
159
165
```
160
166
161
167
And you're done! Once tests pass and the PR is merged, your model will be added to the leaderboard! 🎉
0 commit comments