Skip to content

Commit f28cc6d

Browse files
committed
add /models page to site
/models lists info from new metadata.yml files for wrenformer, bowsr, megnet, m3gnet, voronoi rf, cgcnn
1 parent 4121f49 commit f28cc6d

File tree

16 files changed

+290
-49
lines changed

16 files changed

+290
-49
lines changed

data/wbm/readme.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ The authors performed 5 rounds of elemental substitution in total, each time rel
88

99
Since repeated substitutions should - on average - increase chemical dissimilarity, the 5 iterations of this data-generation process are a unique and compelling feature as it allows out-of distribution testing. We can check how model performance degrades when asked to predict on structures increasingly more dissimilar from the training set (which is restricted to the MP 2022 database release (or earlier) for all models in this benchmark).
1010

11-
## About the IDs
11+
## 🆔   About the IDs
1212

1313
The first integer in each material ID ranging from 1 to 5 and coming right after the prefix `wbm-` indicates the substitution step, i.e. in which iteration of the substitution process was this material generated. Each iteration has varying numbers of materials which are counted by the 2nd integer. Note this 2nd number is not always consecutive. A small number of materials (~0.2%) were removed by the data processing steps detailed below. Don't be surprised to find an ID like `wbm-3-70804` followed by `wbm-3-70807`.
1414

15-
## Data processing steps
15+
## 🪓   Data processing steps
1616

1717
The full set of processing steps used to curate the WBM test set from the raw data files (downloaded from URLs listed below) can be found in [`data/wbm/fetch_process_wbm_dataset.py`](https://github.com/janosh/matbench-discovery/blob/site/data/wbm/fetch_process_wbm_dataset.py). Processing involved
1818

@@ -42,7 +42,7 @@ The number of materials in each step before and after processing are:
4242
| before | 61,848 | 52,800 | 79,205 | 40,328 | 23,308 | 257,487 |
4343
| after | 61,466 | 52,755 | 79,160 | 40,314 | 23,268 | 256,963 |
4444

45-
## Links to raw WBM data files
45+
## 🔗   Links to raw WBM data files
4646

4747
Links to WBM data files have proliferated. This is an attempt to keep track of all of them.
4848

models/bowsr/metadata.yml

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
model_name: BOWSR MEGNet
2+
model_version: 2022.9.20
3+
matbench_discovery_version: 1.0
4+
date_added: 2022-11-17
5+
authors:
6+
- name: Yunxing Zuo
7+
affiliation: UC San Diego
8+
9+
- name: Chi Chen
10+
affiliation: UC San Diego
11+
orcid: https://orcid.org/0000-0001-8008-7043
12+
- name: Shyue Ping Ong
13+
affiliation: UC San Diego
14+
orcid: https://orcid.org/0000-0001-5726-2587
15+
16+
repo: https://github.com/materialsvirtuallab/maml
17+
url: https://materialsvirtuallab.github.io/maml
18+
doi: https://doi.org/10.1016/j.mattod.2021.08.012
19+
preprint: https://arxiv.org/abs/2104.10242
20+
requirements:
21+
maml: 2022.9.20
22+
pymatgen: 2022.10.22
23+
megnet: 1.3.2
24+
numpy: 1.24.0
25+
pandas: 1.5.1
26+
27+
# model specific keys record hyperparameter choices
28+
optimize_kwargs:
29+
alpha: 0.000676
30+
n_init: 100
31+
n_iter: 100
32+
task_type: IS2RE

models/cgcnn/metadata.yml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
model_name: CGCNN
2+
model_version: 2022.9.20
3+
matbench_discovery_version: 1.0
4+
date_added: 2022-12-28
5+
authors:
6+
- name: Tian Xie
7+
8+
affiliation: Massachusetts Institute of Technology
9+
url: https://txie.me
10+
- name: Jeffrey C. Grossman
11+
affiliation: Massachusetts Institute of Technology
12+
url: https://dmse.mit.edu/people/jeffrey-c-grossman
13+
repo: https://github.com/txie-93/cgcnn
14+
doi: https://doi.org/10.1103/PhysRevLett.120.145301
15+
preprint: https://arxiv.org/abs/1710.10324
16+
requirements:
17+
aviary: 0.0.4
18+
torch: 1.11.0
19+
torch-scatter: 2.0.9
20+
numpy: 1.24.0
21+
pandas: 1.5.1

models/m3gnet/metadata.yml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
model_name: M3GNet
2+
model_version: 2022.9.20
3+
matbench_discovery_version: 1.0
4+
date_added: 2022-09-20
5+
authors:
6+
- name: Chi Chen
7+
affiliation: UC San Diego
8+
role: Model
9+
- name: Shyue Ping Ong
10+
affiliation: UC San Diego
11+
orcid: https://orcid.org/0000-0001-5726-2587
12+
13+
repo: https://github.com/materialsvirtuallab/m3gnet
14+
url: https://materialsvirtuallab.github.io/m3gnet
15+
doi: https://doi.org/10.1038/s43588-022-00349-3
16+
preprint: https://arxiv.org/abs/2202.02450
17+
requirements:
18+
m3gnet: 0.1.0
19+
pymatgen: 2022.10.22
20+
numpy: 1.24.0
21+
pandas: 1.5.1

models/megnet/metadata.yml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
model_name: MEGNet
2+
model_version: 2022.9.20
3+
matbench_discovery_version: 1.0
4+
date_added: 2022-11-14
5+
authors:
6+
- name: Chi Chen
7+
affiliation: UC San Diego
8+
orcid: https://orcid.org/0000-0001-8008-7043
9+
- name: Weike Ye
10+
affiliation: UC San Diego
11+
- name: Yunxing Zuo
12+
affiliation: UC San Diego
13+
- name: Chen Zheng
14+
affiliation: UC San Diego
15+
- name: Shyue Ping Ong
16+
affiliation: UC San Diego
17+
orcid: https://orcid.org/0000-0001-5726-2587
18+
19+
repo: https://github.com/materialsvirtuallab/megnet
20+
url: https://materialsvirtuallab.github.io/megnet
21+
doi: https://doi.org/10.1021/acs.chemmater.9b01294
22+
preprint: https://arxiv.org/abs/1812.05055
23+
requirements:
24+
megnet: 1.3.2
25+
pymatgen: 2022.10.22
26+
numpy: 1.24.0
27+
pandas: 1.5.1

models/voronoi/metadata.yml

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
model_name: Voronoi Random Forest
2+
model_version: 1.1.2 # scikit learn version which implements the random forest
3+
matbench_discovery_version: 1.0
4+
date_added: 2022-11-26
5+
authors:
6+
- name: Rhys Goodall
7+
affiliation: University of Cambridge
8+
orcid: https://orcid.org/0000-0002-6589-1700
9+
- name: Janosh Riebesell
10+
affiliation: University of Cambridge, Lawrence Berkeley National Laboratory
11+
12+
orcid: https://orcid.org/0000-0001-5233-3462
13+
repo: https://github.com/janosh/matbench-discovery
14+
doi: https://doi.org/10.1126/sciadv.abn4117
15+
preprint: https://arxiv.org/abs/2106.11132
16+
requirements:
17+
matminer: 0.8.0
18+
scikit-learn: 1.1.2
19+
pymatgen: 2022.10.22
20+
numpy: 1.24.0
21+
pandas: 1.5.1

models/wrenformer/metadata.yml

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
model_name: Wrenformer
2+
model_version: 0.0.4 # the aviary version
3+
matbench_discovery_version: 1.0
4+
date_added: 2022-11-26
5+
authors:
6+
- name: Rhys Goodall
7+
affiliation: University of Cambridge
8+
orcid: https://orcid.org/0000-0002-6589-1700
9+
- name: Janosh Riebesell
10+
affiliation: University of Cambridge, Lawrence Berkeley National Laboratory
11+
12+
orcid: https://orcid.org/0000-0001-5233-3462
13+
repo: https://github.com/janosh/matbench-discovery
14+
doi: https://doi.org/10.1126/sciadv.abn4117
15+
preprint: https://arxiv.org/abs/2106.11132
16+
requirements:
17+
aviary: 0.0.4
18+
torch: 1.11.0
19+
torch-scatter: 2.0.9
20+
pymatgen: 2022.10.22
21+
numpy: 1.24.0
22+
pandas: 1.5.1

site/package.json

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
},
1818
"devDependencies": {
1919
"@iconify/svelte": "^3.0.1",
20+
"@rollup/plugin-yaml": "^4.0.1",
2021
"@sveltejs/adapter-static": "1.0.0",
2122
"@sveltejs/kit": "1.0.1",
2223
"@sveltejs/vite-plugin-svelte": "^2.0.2",

site/src/app.css

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
--toc-mobile-btn-color: white;
99
--toc-desktop-nav-margin: 0 0 0 1em;
1010
--toc-min-width: 20em;
11+
--toc-active-bg: darkcyan;
1112

1213
--ghc-color: var(--night);
1314
--ghc-bg: white;

site/src/routes/+layout.svelte

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
(filename) => `/` + filename.split(`/`)[1]
2121
)
2222
23-
$: headingSelector = `main > :is(${
23+
$: headingSelector = `main :is(${
2424
$page.url.pathname === `/api` ? `h1, ` : ``
2525
}h2, h3, h4):not(.toc-exclude)`
2626
</script>

site/src/routes/api/+page.svelte

-14
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,3 @@
1-
<script lang="ts">
2-
import { onMount } from 'svelte'
3-
4-
onMount(() => {
5-
for (const img of [
6-
...document.querySelectorAll(
7-
`img[src='https://img.shields.io/badge/-source-cccccc?style=flat-square']`
8-
),
9-
] as HTMLAnchorElement[]) {
10-
img.src = `https://img.shields.io/badge/source-blue?style=flat`
11-
}
12-
})
13-
</script>
14-
151
{#each Object.values(import.meta.glob(`./*.md`, { eager: true })) as file}
162
<svelte:component this={file?.default} />
173
{/each}

site/src/routes/how-to-contribute/+page.md

+35-29
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
## Installation
1+
## 🔨 &thinsp; Installation
22

33
The recommended way to acquire the train and test data for this benchmark is through its Python package [available onPyPI](https://pypi.org/project/matbench-discovery):
44

55
```zsh
66
pip install matbench-discovery
77
```
88

9-
## Usage
9+
## 📙 &thinsp; Usage
1010

1111
Here's an example script of how to download the training and test set files for training a new model, recording the results and submitting them via pull request to this benchmark:
1212

@@ -66,7 +66,7 @@ assert list(df_wbm) == [
6666
1. `e_above_hull_mp2020_corrected_ppd_mp`: Energy above hull distances in eV/atom after applying the MP2020 correction scheme and with respect to the Materials Project convex hull. Matbench Discovery takes these as ground truth for material stability. Any value above 0 is assumed to be an unstable/metastable material.
6767
<!-- TODO document remaining columns, or maybe drop them from df -->
6868

69-
## Direct Download
69+
## 📥 &thinsp; Direct Download
7070

7171
You can also download the data files directly from GitHub:
7272

@@ -80,58 +80,64 @@ You can also download the data files directly from GitHub:
8080

8181
[wbm paper]: https://nature.com/articles/s41524-020-00481-6
8282

83-
## How to submit a new model to the leaderboard
83+
## &thinsp; How to submit a new model
8484

85-
To add a new model to this benchmark, please create a pull request to the `main` branch of <https://github.com/janosh/matbench-discovery> that includes at least these 3 required files:
85+
To deploy a new model on this benchmark and add it to our leaderboard, please create a pull request to the `main` branch of <https://github.com/janosh/matbench-discovery> that includes at least these 3 required files:
8686

87-
1. `<yyyy-mm-dd>-<model-name>-preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. Recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('<yyyy-mm-dd>-<model-name>-preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts).
88-
1. `test_<model-name>.(py|ipynb)`: The Python script or Jupyter notebook used to generate the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model-name>.(py|ipynb)`.
89-
1. `metadata.yml`: A file to record all relevant metadata your algorithm like model name, authors (can be different for the model and the PR), package requirements, relevant citations/links to publications and other info about the model. Here's a template:
87+
1. `<yyyy-mm-dd>-<model_name>-preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. Recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('<yyyy-mm-dd>-<model_name>-preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts).
88+
1. `test_<model_name>.(py|ipynb)`: The Python script or Jupyter notebook used to generate the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model_name>.(py|ipynb)`.
89+
1. `metadata.yml`: A file to record all relevant metadata your algorithm like model name and version, authors, package requirements, relevant citations/links to publications, notes, etc. Here's a template:
9090

9191
```yml
92-
model_name: My cool foundational model v1
93-
authors:
94-
- family-names: Doe
95-
given-names: John
92+
# metadata.yml template
93+
model_name: My cool foundational model # required
94+
model_version: 1.0.0 # required
95+
matbench_discovery_version: 1.0 # required
96+
date_added: 2023-01-01 # required
97+
authors: # required (only name, other keys are optional)
98+
- name: John Doe
9699
affiliation: Some University, Some National Lab
97-
email: john@doe.gov
100+
email: john-doe@uni.edu
98101
orcid: https://orcid.org/0000-xxxx-yyyy-zzzz
102+
url: lab.gov/john-doe
99103
corresponding: true
100104
role: Model & PR
101-
- family-names: Jane
102-
given-names: Doe
105+
- name: Jane Doe
103106
affiliation: Some National Lab
104-
107+
108+
url: uni.edu/jane-doe
105109
orcid: https://orcid.org/0000-xxxx-yyyy-zzzz
106110
role: Model
107-
repo: https://github.com/<user>/<repo>
111+
repo: https://github.com/<user>/<repo> # required
108112
url: https://<model-docs-or-similar>.org
109113
doi: https://doi.org/10.5281/zenodo.0000000
110-
version: 1.0.0
111-
requirements:
114+
preprint: https://arxiv.org/abs/xxxx.xxxxx
115+
requirements: # strongly recommended
112116
torch: 1.13.0
113117
torch-geometric: 2.0.9
114118
...
115119
notes:
116-
Optional free form multi-line notes that might help others reproduce your results.
120+
Optional free form multi-line notes that can help others reproduce your results.
117121
```
118122
119-
Only the keys `model_name`, `authors`, `repo`, `version` are required. Arbitrary other keys can be added as needed.
123+
Arbitrary other keys can be added as needed.
120124
121125
Please see any of subdirectories in [`models/`](https://github.com/janosh/matbench-discovery/tree/main/models) for example submissions. More detailed step-by-step instructions below:
122126

123127
### Step 1: Clone the repo
124128

125129
```sh
126130
git clone https://github.com/janosh/matbench-discovery
131+
cd matbench-discovery
132+
git checkout -b <model-name-you-want-to-add>
127133
```
128134

129135
### Step 2: Commit model preds, script and metadata
130136

131137
Create a new folder
132138

133139
```sh
134-
mkdir models/<model-name>
140+
mkdir models/<model_name>
135141
```
136142

137143
and place the above listed files there. The file structure should look like this:
@@ -141,21 +147,21 @@ matbench-discovery-root
141147
└── models
142148
└── <model name>
143149
├── metadata.yml
144-
├── <yyyy-mm-dd>-<model-name>-preds.(json|csv).gz
145-
├── test_<model-name>.py
150+
├── <yyyy-mm-dd>-<model_name>-preds.(json|csv).gz
151+
├── test_<model_name>.py
146152
├── readme.md # optional
147-
└── train_<model-name>.py # optional
153+
└── train_<model_name>.py # optional
148154
```
149155

150-
You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or might help others reproduce your results. For larger files, please upload to Figshare or similar and link them somewhere in your files.
156+
You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or help others reproduce your results. For larger files, please upload to [Figshare](https://figshare.com) or similar and link them somewhere in your files.
151157

152158
### Step 3: Create a PR to the [Matbench Discovery repo](https://github.com/janosh/matbench-discovery)
153159

154-
Commit your files to the repo on a branch called `<model-name>` and create a pull request (PR) to the Matbench repository.
160+
Commit your files to the repo on a branch called `<model_name>` and create a pull request (PR) to the Matbench repository.
155161

156162
```sh
157-
git add -a models/<model-name>
158-
git commit -m 'add <model-name> to Matbench Discovery leaderboard`
163+
git add -a models/<model_name>
164+
git commit -m 'add <model_name> to Matbench Discovery leaderboard'
159165
```
160166

161167
And you're done! Once tests pass and the PR is merged, your model will be added to the leaderboard! 🎉
+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
import { dirname } from 'path'
2+
import type { PageServerLoad } from './$types'
3+
4+
export const load: PageServerLoad = async () => {
5+
const model_metas = Object.entries(
6+
import.meta.glob(`$root/models/**/metadata.yml`, {
7+
eager: true,
8+
})
9+
).map(([key, module]) => [dirname(key), module.default])
10+
11+
return { model_metas }
12+
}

0 commit comments

Comments
 (0)