You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: data/wbm/readme.md
+5-3
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The first integer in each material ID ranging from 1 to 5 and coming right after
14
14
15
15
Each iteration has varying numbers of materials which are counted by the 2nd integer. Note this 2nd number is not always consecutive. A small number of materials (~0.2%) were removed by the data-cleaning steps detailed below. Don't be surprised to find an ID like `wbm-3-70804` followed by `wbm-3-70807`.
16
16
17
-
## 🪓   Data processing steps
17
+
## 🪓   Data Processing Steps
18
18
19
19
The full set of processing steps used to curate the WBM test set from the raw data files (downloaded from URLs listed below) can be found in [`data/wbm/fetch_process_wbm_dataset.py`](https://github.com/janosh/matbench-discovery/blob/site/data/wbm/fetch_process_wbm_dataset.py). Processing involved
20
20
@@ -45,7 +45,7 @@ The number of materials in each step before and after processing are:
Both the WBM test set and even more so the MP training set are heavily oxide dominated. The WBM test set is about 75% larger than the MP training set and also more chemically diverse, containing a higher fraction of transition metals, post-transition metals and metalloids. Our goal in picking such a large diverse test set is future-proofing. Ideally, this data will provide a challenging materials discovery test bed even for large foundational ML models in the future.
76
78
77
79
<slotname="wbm-elements-heatmap">
78
80
<imgsrc="./figs/2023-01-08-wbm-elements.svg"alt="Periodic table log heatmap of WBM elements">
1. [Voronoi Random Forest](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.96.024104) @goodall_rapid_2022
@@ -181,7 +181,7 @@ Our benchmark is designed to make [adding future models easy](/how-to-contribute
181
181
Classification performance for all models
182
182
</caption>
183
183
184
-

184
+

185
185
186
186
<figcaption>@label:fig:each-scatter-models Parity plot for each model's energy above hull predictions (based on their formation energy preds) vs DFT ground truth</figcaption>
187
187
@@ -190,9 +190,9 @@ Our benchmark is designed to make [adding future models easy](/how-to-contribute
190
190
<figcaption>@label:fig:wbm-hull-dist-hist-models Histograms and rolling accuracy of using predicted formation energies for stability classification</figcaption>
0 commit comments