Skip to content

Commit 2f4c33a

Browse files
committed
make landing page best model report dynamic
update all outdated figures still using chgnet 0.2.0 vs 0.3.0 results
1 parent 766155e commit 2f4c33a

15 files changed

+73
-36
lines changed

readme.md

+40-3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,32 @@
1+
<script>
2+
import { onMount } from 'svelte'
3+
import all_stats from './site/src/routes/models/model-stats.json'
4+
5+
let best = Object.entries(all_stats).reduce(
6+
(acc, [model, stats]) => {
7+
if (stats.F1 > acc.F1) {
8+
return { model, ...stats }
9+
}
10+
return acc
11+
},
12+
{ model: `CHGNet`, F1: 0.6 }
13+
)
14+
15+
let best_report // HTMLDivElement
16+
onMount(async () => {
17+
if (best_report && best) {
18+
best_report.style.display = `block`
19+
20+
const { default: metadata } = await import(
21+
`$root/models/${best.model.toLowerCase()}/metadata.yml`
22+
)
23+
24+
best = { ...best, ...metadata }
25+
console.log(`best`, best)
26+
}
27+
})
28+
</script>
29+
130
<h1 align="center">
231
<img src="https://github.com/janosh/matbench-discovery/raw/main/site/static/favicon.svg" alt="Logo" width="60px"><br>
332
Matbench Discovery
@@ -13,11 +42,19 @@
1342

1443
</h4>
1544

16-
> TL;DR: We benchmark ML models on crystal stability prediction from unrelaxed structures finding universal interatomic potentials (UIP) like [CHGNet](https://github.com/CederGroupHub/chgnet), [M3GNet](https://github.com/materialsvirtuallab/m3gnet) and [MACE](https://github.com/ACEsuit/mace) to be highly accurate, robust across chemistries and ready for production use in high-throughput materials discovery.
45+
> TL;DR: We benchmark ML models on crystal stability prediction from unrelaxed structures finding universal interatomic potentials (UIP) like [CHGNet](https://github.com/CederGroupHub/chgnet), [MACE](https://github.com/ACEsuit/mace) and [M3GNet](https://github.com/materialsvirtuallab/m3gnet) to be highly accurate, robust across chemistries and ready for production use in high-throughput materials discovery.
46+
47+
Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery/models) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to rank ML energy models on a task designed to simulate a high-throughput discovery campaign for new stable inorganic crystals.
48+
49+
So far, we've tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential relaxers.
50+
51+
<div bind:this={best_report} style="display: none;">
52+
53+
We find [{best.model}]({best?.repo}) ([paper]({best?.doi})) to achieve the highest F1 score of {best.F1}, $R^2$ of {best.R2} and a discovery acceleration factor (DAF) of {best.DAF} (meaning a ~{Number(best.DAF).toFixed(0)}x higher rate of stable structures compared to dummy selection in our already enriched search space).
1754

18-
Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery/models) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to rank ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals.
55+
</div>
1956

20-
So far, we've tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential relaxers. We find [CHGNet](https://github.com/CederGroupHub/chgnet) ([paper](https://doi.org/10.48550/arXiv.2302.14231)) to achieve the highest F1 score of 0.59, $R^2$ of 0.61 and a discovery acceleration factor (DAF) of 3.06 (meaning a 3x higher rate of stable structures compared to dummy selection in our already enriched search space). We believe our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
57+
Our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
2158

2259
<slot name="metrics-table" />
2360

scripts/model_figs/make_hull_dist_box_plot.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
import plotly.graph_objects as go
44
import seaborn as sns
55
from pymatviz.io import save_fig
6-
from pymatviz.utils import patch_dict
76

87
from matbench_discovery import PDF_FIGS, SITE_FIGS, plots
98
from matbench_discovery.preds import df_each_err, models
@@ -96,6 +95,5 @@
9695

9796
# %%
9897
save_fig(fig, f"{SITE_FIGS}/box-hull-dist-errors.svelte")
99-
100-
with patch_dict(fig.layout, showlegend=False):
101-
save_fig(fig, f"{PDF_FIGS}/box-hull-dist-errors.pdf")
98+
fig.layout.showlegend = False
99+
save_fig(fig, f"{PDF_FIGS}/box-hull-dist-errors.pdf")

scripts/model_figs/make_metrics_tables.py

+3
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@
102102
lower_is_better = {"MAE", "RMSE", "FPR", "FNR", "FP", "FN"}
103103

104104
# if True, make metrics-table-megnet-uip-combos.(svelte|pdf) for SI
105+
# if False, make metrics-table.(svelte|pdf) for main text
106+
# when setting to True, uncomment the lines chgnet_megnet, m3gnet_megnet, megnet_rs2re
107+
# in PredFiles!
105108
make_uip_megnet_comparison = False
106109
show_cols = (
107110
f"F1,DAF,Precision,Accuracy,TPR,TNR,MAE,RMSE,{R2_col},"

scripts/model_figs/roc_prc_curves_models.py

+4-5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
1-
"""Histogram of the energy difference (either according to DFT ground truth [default] or
2-
model predicted energy) to the convex hull for materials in the WBM data set. The
3-
histogram stacks true/false positives/negatives with different colors.
4-
"""
1+
"""Plot ROC and PR (precision-recall) curves for each model."""
52

63

74
# %%
@@ -40,12 +37,14 @@
4037

4138
for model in (pbar := tqdm(models, desc="Calculating ROC curves")):
4239
pbar.set_postfix_str(model)
40+
4341
na_mask = df_preds[each_true_col].isna() | df_each_pred[model].isna()
4442
y_true = (df_preds[~na_mask][each_true_col] <= STABILITY_THRESHOLD).astype(int)
4543
y_pred = df_each_pred[model][~na_mask]
4644
fpr, tpr, thresholds = roc_curve(y_true, y_pred, pos_label=0)
4745
AUC = auc(fpr, tpr)
4846
title = f"{model} · {AUC=:.2f}"
47+
thresholds = [f"{t:.3} eV/atom" for t in thresholds]
4948
df_tmp = pd.DataFrame(
5049
{"FPR": fpr, "TPR": tpr, color_col: thresholds, "AUC": AUC, facet_col: title}
5150
).round(3)
@@ -79,7 +78,7 @@
7978
range_x=(-0.01, 1),
8079
range_y=(0, 1.02),
8180
hover_name=facet_col,
82-
hover_data={facet_col: False},
81+
hover_data={facet_col: False, color_col: True},
8382
**(kwds if facet_plot else dict(color=facet_col, markers=True)),
8483
)
8584

scripts/model_figs/scatter_hull_dist_models.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@
237237
textangle=-90,
238238
**axis_titles,
239239
)
240-
fig.layout.height = 200 * n_rows
240+
fig.layout.height = 230 * n_rows
241241
fig.layout.coloraxis.colorbar.update(orientation="h", thickness=9, len=0.5, y=1.05)
242242
# fig.layout.width = 1100
243243
fig.layout.margin.update(l=40, r=10, t=30, b=60)

scripts/rolling_mae_vs_hull_dist_wbm_batches.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
batch_col = "batch_idx"
2323
df_each_pred[batch_col] = "Batch " + df_each_pred.index.str.split("-").str[1]
2424
df_err, df_std = None, None # variables to cache rolling MAE and std
25-
model = "MEGNet"
25+
model = "CHGNet"
2626

2727

2828
# %% matplotlib

site/package.json

+14-14
Original file line numberDiff line numberDiff line change
@@ -18,37 +18,37 @@
1818
},
1919
"devDependencies": {
2020
"@iconify/svelte": "^3.1.4",
21-
"@rollup/plugin-yaml": "^4.1.1",
21+
"@rollup/plugin-yaml": "^4.1.2",
2222
"@sveltejs/adapter-static": "^2.0.3",
23-
"@sveltejs/kit": "^1.25.0",
24-
"@sveltejs/vite-plugin-svelte": "^2.4.5",
25-
"@typescript-eslint/eslint-plugin": "^6.7.0",
26-
"@typescript-eslint/parser": "^6.7.0",
23+
"@sveltejs/kit": "^1.27.1",
24+
"@sveltejs/vite-plugin-svelte": "^2.4.6",
25+
"@typescript-eslint/eslint-plugin": "^6.9.0",
26+
"@typescript-eslint/parser": "^6.9.0",
2727
"d3-scale-chromatic": "^3.0.0",
2828
"elementari": "^0.2.2",
29-
"eslint": "^8.49.0",
30-
"eslint-plugin-svelte": "^2.33.1",
29+
"eslint": "^8.52.0",
30+
"eslint-plugin-svelte": "^2.34.0",
3131
"hastscript": "^8.0.0",
32-
"highlight.js": "^11.8.0",
32+
"highlight.js": "^11.9.0",
3333
"js-yaml": "^4.1.0",
34-
"katex": "^0.16.8",
34+
"katex": "^0.16.9",
3535
"mdsvex": "^0.11.0",
3636
"prettier": "^3.0.3",
3737
"prettier-plugin-svelte": "^3.0.3",
3838
"rehype-autolink-headings": "^7.0.0",
3939
"rehype-katex-svelte": "^1.2.0",
4040
"rehype-slug": "^6.0.0",
4141
"remark-math": "3.0.0",
42-
"svelte": "^4.2.0",
43-
"svelte-check": "^3.5.1",
44-
"svelte-multiselect": "^10.1.0",
42+
"svelte": "^4.2.2",
43+
"svelte-check": "^3.5.2",
44+
"svelte-multiselect": "^10.2.0",
4545
"svelte-preprocess": "^5.0.4",
4646
"svelte-toc": "^0.5.6",
4747
"svelte-zoo": "^0.4.9",
48-
"svelte2tsx": "^0.6.21",
48+
"svelte2tsx": "^0.6.23",
4949
"tslib": "^2.6.2",
5050
"typescript": "5.2.2",
51-
"vite": "^4.4.9"
51+
"vite": "^4.5.0"
5252
},
5353
"prettier": {
5454
"semi": false,

site/src/figs/box-hull-dist-errors.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/cumulative-mae.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/cumulative-precision-recall.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/each-scatter-models-5x2.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/hist-clf-pred-hull-dist-models-5x2.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/roc-models-all-in-one.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/figs/rolling-mae-vs-hull-dist-wbm-batches-chgnet.svelte

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

site/src/routes/preprint/+page.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -409,7 +409,7 @@ A material is classified as stable if the predicted $E_\text{above hull}$ lies b
409409
<RocModels />
410410
{/if}
411411

412-
> @label:fig:roc-models Receiver operating characteristic (ROC) curve for each model. TPR/FPR = true/false positive rate. FPR on the $x$-axis is the fraction of unstable structures classified as stable. TPR on the $y$-axis is the fraction of stable structures classified as stable. The stability threshold $t$ sweeps from $-0.4 \ \frac{\text{eV}}{\text{atom}} \leq t \leq 0.4 \ \frac{\text{eV}}{\text{atom}}$ above the hull.
412+
> @label:fig:roc-models Receiver operating characteristic (ROC) curve for each model. TPR/FPR = true/false positive rate. FPR on the $x$-axis is the fraction of unstable structures classified as stable. TPR on the $y$-axis is the fraction of stable structures classified as stable.
413413
414414
### Parity Plots
415415

0 commit comments

Comments
 (0)