add /models page to site

janosh · janosh · commit f28cc6d283e1 · 2023-06-19T20:29:22.000-07:00
/models lists info from new metadata.yml files for wrenformer, bowsr, megnet, m3gnet, voronoi rf, cgcnn
diff --git a/data/wbm/readme.md b/data/wbm/readme.md
@@ -8,11 +8,11 @@ The authors performed 5 rounds of elemental substitution in total, each time rel
 
 Since repeated substitutions should - on average - increase chemical dissimilarity, the 5 iterations of this data-generation process are a unique and compelling feature as it allows out-of distribution testing. We can check how model performance degrades when asked to predict on structures increasingly more dissimilar from the training set (which is restricted to the MP 2022 database release (or earlier) for all models in this benchmark).
 
-## About the IDs
+## 🆔 &thinsp; About the IDs
 
 The first integer in each material ID ranging from 1 to 5 and coming right after the prefix `wbm-` indicates the substitution step, i.e. in which iteration of the substitution process was this material generated. Each iteration has varying numbers of materials which are counted by the 2nd integer. Note this 2nd number is not always consecutive. A small number of materials (~0.2%) were removed by the data processing steps detailed below. Don't be surprised to find an ID like `wbm-3-70804` followed by `wbm-3-70807`.
 
-## Data processing steps
+## 🪓 &thinsp; Data processing steps
 
 The full set of processing steps used to curate the WBM test set from the raw data files (downloaded from URLs listed below) can be found in [`data/wbm/fetch_process_wbm_dataset.py`](https://github.com/janosh/matbench-discovery/blob/site/data/wbm/fetch_process_wbm_dataset.py). Processing involved
 
@@ -42,7 +42,7 @@ The number of materials in each step before and after processing are:
 | before | 61,848 | 52,800 | 79,205 | 40,328 | 23,308 | 257,487 |
 | after  | 61,466 | 52,755 | 79,160 | 40,314 | 23,268 | 256,963 |
 
-## Links to raw WBM data files
+## 🔗 &thinsp; Links to raw WBM data files
 
 Links to WBM data files have proliferated. This is an attempt to keep track of all of them.
 
diff --git a/models/bowsr/metadata.yml b/models/bowsr/metadata.yml
@@ -0,0 +1,32 @@
+model_name: BOWSR MEGNet
+model_version: 2022.9.20
+matbench_discovery_version: 1.0
+date_added: 2022-11-17
+authors:
+  - name: Yunxing Zuo
+    affiliation: UC San Diego
+    email: y9zuo@eng.ucsd.edu
+  - name: Chi Chen
+    affiliation: UC San Diego
+    orcid: https://orcid.org/0000-0001-8008-7043
+  - name: Shyue Ping Ong
+    affiliation: UC San Diego
+    orcid: https://orcid.org/0000-0001-5726-2587
+    email: ongsp@ucsd.edu
+repo: https://github.com/materialsvirtuallab/maml
+url: https://materialsvirtuallab.github.io/maml
+doi: https://doi.org/10.1016/j.mattod.2021.08.012
+preprint: https://arxiv.org/abs/2104.10242
+requirements:
+  maml: 2022.9.20
+  pymatgen: 2022.10.22
+  megnet: 1.3.2
+  numpy: 1.24.0
+  pandas: 1.5.1
+
+# model specific keys record hyperparameter choices
+optimize_kwargs:
+  alpha: 0.000676
+  n_init: 100
+  n_iter: 100
+task_type: IS2RE
diff --git a/models/cgcnn/metadata.yml b/models/cgcnn/metadata.yml
@@ -0,0 +1,21 @@
+model_name: CGCNN
+model_version: 2022.9.20
+matbench_discovery_version: 1.0
+date_added: 2022-12-28
+authors:
+  - name: Tian Xie
+    email: txie@csail.mit.edu
+    affiliation: Massachusetts Institute of Technology
+    url: https://txie.me
+  - name: Jeffrey C. Grossman
+    affiliation: Massachusetts Institute of Technology
+    url: https://dmse.mit.edu/people/jeffrey-c-grossman
+repo: https://github.com/txie-93/cgcnn
+doi: https://doi.org/10.1103/PhysRevLett.120.145301
+preprint: https://arxiv.org/abs/1710.10324
+requirements:
+  aviary: 0.0.4
+  torch: 1.11.0
+  torch-scatter: 2.0.9
+  numpy: 1.24.0
+  pandas: 1.5.1
diff --git a/models/m3gnet/metadata.yml b/models/m3gnet/metadata.yml
@@ -0,0 +1,21 @@
+model_name: M3GNet
+model_version: 2022.9.20
+matbench_discovery_version: 1.0
+date_added: 2022-09-20
+authors:
+  - name: Chi Chen
+    affiliation: UC San Diego
+    role: Model
+  - name: Shyue Ping Ong
+    affiliation: UC San Diego
+    orcid: https://orcid.org/0000-0001-5726-2587
+    email: ongsp@ucsd.edu
+repo: https://github.com/materialsvirtuallab/m3gnet
+url: https://materialsvirtuallab.github.io/m3gnet
+doi: https://doi.org/10.1038/s43588-022-00349-3
+preprint: https://arxiv.org/abs/2202.02450
+requirements:
+  m3gnet: 0.1.0
+  pymatgen: 2022.10.22
+  numpy: 1.24.0
+  pandas: 1.5.1
diff --git a/models/megnet/metadata.yml b/models/megnet/metadata.yml
@@ -0,0 +1,27 @@
+model_name: MEGNet
+model_version: 2022.9.20
+matbench_discovery_version: 1.0
+date_added: 2022-11-14
+authors:
+  - name: Chi Chen
+    affiliation: UC San Diego
+    orcid: https://orcid.org/0000-0001-8008-7043
+  - name: Weike Ye
+    affiliation: UC San Diego
+  - name: Yunxing Zuo
+    affiliation: UC San Diego
+  - name: Chen Zheng
+    affiliation: UC San Diego
+  - name: Shyue Ping Ong
+    affiliation: UC San Diego
+    orcid: https://orcid.org/0000-0001-5726-2587
+    email: ongsp@ucsd.edu
+repo: https://github.com/materialsvirtuallab/megnet
+url: https://materialsvirtuallab.github.io/megnet
+doi: https://doi.org/10.1021/acs.chemmater.9b01294
+preprint: https://arxiv.org/abs/1812.05055
+requirements:
+  megnet: 1.3.2
+  pymatgen: 2022.10.22
+  numpy: 1.24.0
+  pandas: 1.5.1
diff --git a/models/voronoi/metadata.yml b/models/voronoi/metadata.yml
@@ -0,0 +1,21 @@
+model_name: Voronoi Random Forest
+model_version: 1.1.2 # scikit learn version which implements the random forest
+matbench_discovery_version: 1.0
+date_added: 2022-11-26
+authors:
+  - name: Rhys Goodall
+    affiliation: University of Cambridge
+    orcid: https://orcid.org/0000-0002-6589-1700
+  - name: Janosh Riebesell
+    affiliation: University of Cambridge, Lawrence Berkeley National Laboratory
+    email: janosh@lbl.gov
+    orcid: https://orcid.org/0000-0001-5233-3462
+repo: https://github.com/janosh/matbench-discovery
+doi: https://doi.org/10.1126/sciadv.abn4117
+preprint: https://arxiv.org/abs/2106.11132
+requirements:
+  matminer: 0.8.0
+  scikit-learn: 1.1.2
+  pymatgen: 2022.10.22
+  numpy: 1.24.0
+  pandas: 1.5.1
diff --git a/models/wrenformer/metadata.yml b/models/wrenformer/metadata.yml
@@ -0,0 +1,22 @@
+model_name: Wrenformer
+model_version: 0.0.4 # the aviary version
+matbench_discovery_version: 1.0
+date_added: 2022-11-26
+authors:
+  - name: Rhys Goodall
+    affiliation: University of Cambridge
+    orcid: https://orcid.org/0000-0002-6589-1700
+  - name: Janosh Riebesell
+    affiliation: University of Cambridge, Lawrence Berkeley National Laboratory
+    email: janosh@lbl.gov
+    orcid: https://orcid.org/0000-0001-5233-3462
+repo: https://github.com/janosh/matbench-discovery
+doi: https://doi.org/10.1126/sciadv.abn4117
+preprint: https://arxiv.org/abs/2106.11132
+requirements:
+  aviary: 0.0.4
+  torch: 1.11.0
+  torch-scatter: 2.0.9
+  pymatgen: 2022.10.22
+  numpy: 1.24.0
+  pandas: 1.5.1
diff --git a/site/package.json b/site/package.json
@@ -17,6 +17,7 @@
   },
   "devDependencies": {
     "@iconify/svelte": "^3.0.1",
+    "@rollup/plugin-yaml": "^4.0.1",
     "@sveltejs/adapter-static": "1.0.0",
     "@sveltejs/kit": "1.0.1",
     "@sveltejs/vite-plugin-svelte": "^2.0.2",
diff --git a/site/src/app.css b/site/src/app.css
@@ -8,6 +8,7 @@
   --toc-mobile-btn-color: white;
   --toc-desktop-nav-margin: 0 0 0 1em;
   --toc-min-width: 20em;
+  --toc-active-bg: darkcyan;
 
   --ghc-color: var(--night);
   --ghc-bg: white;
diff --git a/site/src/routes/+layout.svelte b/site/src/routes/+layout.svelte
@@ -20,7 +20,7 @@
     (filename) => `/` + filename.split(`/`)[1]
   )
 
-  $: headingSelector = `main > :is(${
+  $: headingSelector = `main :is(${
     $page.url.pathname === `/api` ? `h1, ` : ``
   }h2, h3, h4):not(.toc-exclude)`
 </script>
diff --git a/site/src/routes/api/+page.svelte b/site/src/routes/api/+page.svelte
@@ -1,17 +1,3 @@
-<script lang="ts">
-  import { onMount } from 'svelte'
-
-  onMount(() => {
-    for (const img of [
-      ...document.querySelectorAll(
-        `img[src='https://img.shields.io/badge/-source-cccccc?style=flat-square']`
-      ),
-    ] as HTMLAnchorElement[]) {
-      img.src = `https://img.shields.io/badge/source-blue?style=flat`
-    }
-  })
-</script>
-
 {#each Object.values(import.meta.glob(`./*.md`, { eager: true })) as file}
   <svelte:component this={file?.default} />
 {/each}
diff --git a/site/src/routes/how-to-contribute/+page.md b/site/src/routes/how-to-contribute/+page.md
@@ -1,12 +1,12 @@
-## Installation
+## 🔨 &thinsp; Installation
 
 The recommended way to acquire the train and test data for this benchmark is through its Python package [available onPyPI](https://pypi.org/project/matbench-discovery):
 
 ```zsh
 pip install matbench-discovery
 ```
 
-## Usage
+## 📙 &thinsp; Usage
 
 Here's an example script of how to download the training and test set files for training a new model, recording the results and submitting them via pull request to this benchmark:
 
@@ -66,7 +66,7 @@ assert list(df_wbm) == [
 1. `e_above_hull_mp2020_corrected_ppd_mp`: Energy above hull distances in eV/atom after applying the MP2020 correction scheme and with respect to the Materials Project convex hull. Matbench Discovery takes these as ground truth for material stability. Any value above 0 is assumed to be an unstable/metastable material.
 <!-- TODO document remaining columns, or maybe drop them from df -->
 
-## Direct Download
+## 📥 &thinsp; Direct Download
 
 You can also download the data files directly from GitHub:
 
@@ -80,58 +80,64 @@ You can also download the data files directly from GitHub:
 
 [wbm paper]: https://nature.com/articles/s41524-020-00481-6
 
-## How to submit a new model to the leaderboard
+## ✨ &thinsp; How to submit a new model
 
-To add a new model to this benchmark, please create a pull request to the `main` branch of <https://github.com/janosh/matbench-discovery> that includes at least these 3 required files:
+To deploy a new model on this benchmark and add it to our leaderboard, please create a pull request to the `main` branch of <https://github.com/janosh/matbench-discovery> that includes at least these 3 required files:
 
-1. `<yyyy-mm-dd>-<model-name>-preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. Recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('<yyyy-mm-dd>-<model-name>-preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts).
-1. `test_<model-name>.(py|ipynb)`: The Python script or Jupyter notebook used to generate the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model-name>.(py|ipynb)`.
-1. `metadata.yml`: A file to record all relevant metadata your algorithm like model name, authors (can be different for the model and the PR), package requirements, relevant citations/links to publications and other info about the model. Here's a template:
+1. `<yyyy-mm-dd>-<model_name>-preds.(json|csv).gz`: Your model's energy predictions for all ~250k WBM compounds as compressed JSON or CSV. Recommended way to create this file is with `pandas.DataFrame.to_{json|csv}('<yyyy-mm-dd>-<model_name>-preds.(json|csv).gz')`. JSON is preferred over CSV if your model not only predicts energies (floats) but also Python objects like e.g. pseudo-relaxed structures (see the M3GNet and BOWSR test scripts).
+1. `test_<model_name>.(py|ipynb)`: The Python script or Jupyter notebook used to generate the energy predictions. Ideally, this file should have comments explaining at a high level what the code is doing and how the model works so others can understand and reproduce your results. If the model deployed on this benchmark was trained specifically for this purpose (i.e. if you wrote any training/fine-tuning code while preparing your PR), please also include it as `train_<model_name>.(py|ipynb)`.
+1. `metadata.yml`: A file to record all relevant metadata your algorithm like model name and version, authors, package requirements, relevant citations/links to publications, notes, etc. Here's a template:
 
    ```yml
-   model_name: My cool foundational model v1
-   authors:
-     - family-names: Doe
-       given-names: John
+   # metadata.yml template
+   model_name: My cool foundational model # required
+   model_version: 1.0.0 # required
+   matbench_discovery_version: 1.0 # required
+   date_added: 2023-01-01 # required
+   authors: # required (only name, other keys are optional)
+     - name: John Doe
        affiliation: Some University, Some National Lab
-       email: john@doe.gov
+       email: john-doe@uni.edu
        orcid: https://orcid.org/0000-xxxx-yyyy-zzzz
+       url: lab.gov/john-doe
        corresponding: true
        role: Model & PR
-     - family-names: Jane
-       given-names: Doe
+     - name: Jane Doe
        affiliation: Some National Lab
-       email: jane@doe.edu
+       email: jane-doe@lab.gov
+       url: uni.edu/jane-doe
        orcid: https://orcid.org/0000-xxxx-yyyy-zzzz
        role: Model
-   repo: https://github.com/<user>/<repo>
+   repo: https://github.com/<user>/<repo> # required
    url: https://<model-docs-or-similar>.org
    doi: https://doi.org/10.5281/zenodo.0000000
-   version: 1.0.0
-   requirements:
+   preprint: https://arxiv.org/abs/xxxx.xxxxx
+   requirements: # strongly recommended
      torch: 1.13.0
      torch-geometric: 2.0.9
      ...
    notes:
-     Optional free form multi-line notes that might help others reproduce your results.
+     Optional free form multi-line notes that can help others reproduce your results.
    ```
 
-   Only the keys `model_name`, `authors`, `repo`, `version` are required. Arbitrary other keys can be added as needed.
+   Arbitrary other keys can be added as needed.
 
 Please see any of subdirectories in [`models/`](https://github.com/janosh/matbench-discovery/tree/main/models) for example submissions. More detailed step-by-step instructions below:
 
 ### Step 1: Clone the repo
 
 ```sh
 git clone https://github.com/janosh/matbench-discovery
+cd matbench-discovery
+git checkout -b <model-name-you-want-to-add>
 ```
 
 ### Step 2: Commit model preds, script and metadata
 
 Create a new folder
 
 ```sh
-mkdir models/<model-name>
+mkdir models/<model_name>
 ```
 
 and place the above listed files there. The file structure should look like this:
@@ -141,21 +147,21 @@ matbench-discovery-root
 └── models
     └── <model name>
         ├── metadata.yml
-        ├── <yyyy-mm-dd>-<model-name>-preds.(json|csv).gz
-        ├── test_<model-name>.py
+        ├── <yyyy-mm-dd>-<model_name>-preds.(json|csv).gz
+        ├── test_<model_name>.py
         ├── readme.md # optional
-        └── train_<model-name>.py # optional
+        └── train_<model_name>.py # optional
 ```
 
-You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or might help others reproduce your results. For larger files, please upload to Figshare or similar and link them somewhere in your files.
+You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or help others reproduce your results. For larger files, please upload to [Figshare](https://figshare.com) or similar and link them somewhere in your files.
 
 ### Step 3: Create a PR to the [Matbench Discovery repo](https://github.com/janosh/matbench-discovery)
 
-Commit your files to the repo on a branch called `<model-name>` and create a pull request (PR) to the Matbench repository.
+Commit your files to the repo on a branch called `<model_name>` and create a pull request (PR) to the Matbench repository.
 
 ```sh
-git add -a models/<model-name>
-git commit -m 'add <model-name> to Matbench Discovery leaderboard`
+git add -a models/<model_name>
+git commit -m 'add <model_name> to Matbench Discovery leaderboard'
 ```
 
 And you're done! Once tests pass and the PR is merged, your model will be added to the leaderboard! 🎉
diff --git a/site/src/routes/models/+page.server.ts b/site/src/routes/models/+page.server.ts
@@ -0,0 +1,12 @@
+import { dirname } from 'path'
+import type { PageServerLoad } from './$types'
+
+export const load: PageServerLoad = async () => {
+  const model_metas = Object.entries(
+    import.meta.glob(`$root/models/**/metadata.yml`, {
+      eager: true,
+    })
+  ).map(([key, module]) => [dirname(key), module.default])
+
+  return { model_metas }
+}
diff --git a/site/src/routes/models/+page.svelte b/site/src/routes/models/+page.svelte
diff --git a/site/tsconfig.json b/site/tsconfig.json
diff --git a/site/vite.config.ts b/site/vite.config.ts

Original file line number	Diff line number	Diff line change
`@@ -20,7 +20,7 @@`
`20`	`20`	(filename) => `/` + filename.split(`/`)[1]
`21`	`21`	`)`
`22`	`22`
`23`		- $: headingSelector = `main > :is(${
	`23`	+ $: headingSelector = `main :is(${
`24`	`24`	$page.url.pathname === `/api` ? `h1, ` : ``
`25`	`25`	}h2, h3, h4):not(.toc-exclude)`
`26`	`26`	`</script>`