janosh
diff --git a/‎readme.md
+40-3 b/‎readme.md
+40-3
diff --git a/‎scripts/model_figs/make_hull_dist_box_plot.py
+2-4 b/‎scripts/model_figs/make_hull_dist_box_plot.py
+2-4
diff --git a/‎scripts/model_figs/make_metrics_tables.py
+3 b/‎scripts/model_figs/make_metrics_tables.py
+3
diff --git a/‎scripts/model_figs/roc_prc_curves_models.py
+4-5 b/‎scripts/model_figs/roc_prc_curves_models.py
+4-5
diff --git a/‎scripts/model_figs/scatter_hull_dist_models.py
+1-1 b/‎scripts/model_figs/scatter_hull_dist_models.py
+1-1
diff --git a/‎scripts/rolling_mae_vs_hull_dist_wbm_batches.py
+1-1 b/‎scripts/rolling_mae_vs_hull_dist_wbm_batches.py
+1-1
diff --git a/‎site/package.json
+14-14 b/‎site/package.json
+14-14
diff --git a/‎site/src/figs/box-hull-dist-errors.svelte
+1-1 b/‎site/src/figs/box-hull-dist-errors.svelte
+1-1
diff --git a/‎site/src/figs/cumulative-mae.svelte
+1-1 b/‎site/src/figs/cumulative-mae.svelte
+1-1
diff --git a/‎site/src/figs/cumulative-precision-recall.svelte
+1-1 b/‎site/src/figs/cumulative-precision-recall.svelte
+1-1
diff --git a/‎site/src/figs/each-scatter-models-5x2.svelte
+1-1 b/‎site/src/figs/each-scatter-models-5x2.svelte
+1-1
diff --git a/‎site/src/figs/hist-clf-pred-hull-dist-models-5x2.svelte
+1-1 b/‎site/src/figs/hist-clf-pred-hull-dist-models-5x2.svelte
+1-1
diff --git a/‎site/src/figs/roc-models-all-in-one.svelte
+1-1 b/‎site/src/figs/roc-models-all-in-one.svelte
+1-1
diff --git a/‎site/src/figs/rolling-mae-vs-hull-dist-wbm-batches-chgnet.svelte
+1-1 b/‎site/src/figs/rolling-mae-vs-hull-dist-wbm-batches-chgnet.svelte
+1-1
diff --git a/‎site/src/routes/preprint/+page.md
+1-1 b/‎site/src/routes/preprint/+page.md
+1-1
@@ -1,3 +1,32 @@
+<script>
+  import { onMount } from 'svelte'
+  import all_stats from './site/src/routes/models/model-stats.json'
+
+  let best = Object.entries(all_stats).reduce(
+    (acc, [model, stats]) => {
+      if (stats.F1 > acc.F1) {
+        return { model, ...stats }
+      }
+      return acc
+    },
+    { model: `CHGNet`, F1: 0.6 }
+  )
+
+  let best_report // HTMLDivElement
+  onMount(async () => {
+    if (best_report && best) {
+      best_report.style.display = `block`
+
+      const { default: metadata } = await import(
+        `$root/models/${best.model.toLowerCase()}/metadata.yml`
+      )
+
+      best = { ...best, ...metadata }
+      console.log(`best`, best)
+    }
+  })
+</script>
+
 <h1 align="center">
   <img src="https://github.com/janosh/matbench-discovery/raw/main/site/static/favicon.svg" alt="Logo" width="60px"><br>
   Matbench Discovery
@@ -13,11 +42,19 @@
 
 </h4>
 
-> TL;DR: We benchmark ML models on crystal stability prediction from unrelaxed structures finding universal interatomic potentials (UIP) like [CHGNet](https://github.com/CederGroupHub/chgnet), [M3GNet](https://github.com/materialsvirtuallab/m3gnet) and [MACE](https://github.com/ACEsuit/mace) to be highly accurate, robust across chemistries and ready for production use in high-throughput materials discovery.
+> TL;DR: We benchmark ML models on crystal stability prediction from unrelaxed structures finding universal interatomic potentials (UIP) like [CHGNet](https://github.com/CederGroupHub/chgnet), [MACE](https://github.com/ACEsuit/mace) and [M3GNet](https://github.com/materialsvirtuallab/m3gnet) to be highly accurate, robust across chemistries and ready for production use in high-throughput materials discovery.
+
+Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery/models) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to rank ML energy models on a task designed to simulate a high-throughput discovery campaign for new stable inorganic crystals.
+
+So far, we've tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential relaxers.
+
+<div bind:this={best_report} style="display: none;">
+
+We find [{best.model}]({best?.repo}) ([paper]({best?.doi})) to achieve the highest F1 score of {best.F1}, $R^2$ of {best.R2} and a discovery acceleration factor (DAF) of {best.DAF} (meaning a ~{Number(best.DAF).toFixed(0)}x higher rate of stable structures compared to dummy selection in our already enriched search space).
 
-Matbench Discovery is an [interactive leaderboard](https://janosh.github.io/matbench-discovery/models) and associated [PyPI package](https://pypi.org/project/matbench-discovery) which together make it easy to rank ML energy models on a task designed to closely simulate a high-throughput discovery campaign for new stable inorganic crystals.
+</div>
 
-So far, we've tested 8 models covering multiple methodologies ranging from random forests with structure fingerprints to graph neural networks, from one-shot predictors to iterative Bayesian optimizers and interatomic potential relaxers. We find [CHGNet](https://github.com/CederGroupHub/chgnet) ([paper](https://doi.org/10.48550/arXiv.2302.14231)) to achieve the highest F1 score of 0.59, $R^2$ of 0.61 and a discovery acceleration factor (DAF) of 3.06 (meaning a 3x higher rate of stable structures compared to dummy selection in our already enriched search space). We believe our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
+Our results show that ML models have become robust enough to deploy them as triaging steps to more effectively allocate compute in high-throughput DFT relaxations. This work provides valuable insights for anyone looking to build large-scale materials databases.
 
 <slot name="metrics-table" />
 
 
@@ -3,7 +3,6 @@
 import plotly.graph_objects as go
 import seaborn as sns
 from pymatviz.io import save_fig
-from pymatviz.utils import patch_dict
 
 from matbench_discovery import PDF_FIGS, SITE_FIGS, plots
 from matbench_discovery.preds import df_each_err, models
@@ -96,6 +95,5 @@
 
 # %%
 save_fig(fig, f"{SITE_FIGS}/box-hull-dist-errors.svelte")
-
-with patch_dict(fig.layout, showlegend=False):
-    save_fig(fig, f"{PDF_FIGS}/box-hull-dist-errors.pdf")
+fig.layout.showlegend = False
+save_fig(fig, f"{PDF_FIGS}/box-hull-dist-errors.pdf")
@@ -102,6 +102,9 @@
 lower_is_better = {"MAE", "RMSE", "FPR", "FNR", "FP", "FN"}
 
 # if True, make metrics-table-megnet-uip-combos.(svelte|pdf) for SI
+# if False, make metrics-table.(svelte|pdf) for main text
+# when setting to True, uncomment the lines chgnet_megnet, m3gnet_megnet, megnet_rs2re
+# in PredFiles!
 make_uip_megnet_comparison = False
 show_cols = (
     f"F1,DAF,Precision,Accuracy,TPR,TNR,MAE,RMSE,{R2_col},"
 
@@ -1,7 +1,4 @@
-"""Histogram of the energy difference (either according to DFT ground truth [default] or
-model predicted energy) to the convex hull for materials in the WBM data set. The
-histogram stacks true/false positives/negatives with different colors.
-"""
+"""Plot ROC and PR (precision-recall) curves for each model."""
 
 
 # %%
@@ -40,12 +37,14 @@
 
 for model in (pbar := tqdm(models, desc="Calculating ROC curves")):
     pbar.set_postfix_str(model)
+
     na_mask = df_preds[each_true_col].isna() | df_each_pred[model].isna()
     y_true = (df_preds[~na_mask][each_true_col] <= STABILITY_THRESHOLD).astype(int)
     y_pred = df_each_pred[model][~na_mask]
     fpr, tpr, thresholds = roc_curve(y_true, y_pred, pos_label=0)
     AUC = auc(fpr, tpr)
     title = f"{model} · {AUC=:.2f}"
+    thresholds = [f"{t:.3} eV/atom" for t in thresholds]
     df_tmp = pd.DataFrame(
         {"FPR": fpr, "TPR": tpr, color_col: thresholds, "AUC": AUC, facet_col: title}
     ).round(3)
@@ -79,7 +78,7 @@
     range_x=(-0.01, 1),
     range_y=(0, 1.02),
     hover_name=facet_col,
-    hover_data={facet_col: False},
+    hover_data={facet_col: False, color_col: True},
     **(kwds if facet_plot else dict(color=facet_col, markers=True)),
 )
 
 
@@ -237,7 +237,7 @@
     textangle=-90,
     **axis_titles,
 )
-fig.layout.height = 200 * n_rows
+fig.layout.height = 230 * n_rows
 fig.layout.coloraxis.colorbar.update(orientation="h", thickness=9, len=0.5, y=1.05)
 # fig.layout.width = 1100
 fig.layout.margin.update(l=40, r=10, t=30, b=60)
 
@@ -22,7 +22,7 @@
 batch_col = "batch_idx"
 df_each_pred[batch_col] = "Batch " + df_each_pred.index.str.split("-").str[1]
 df_err, df_std = None, None  # variables to cache rolling MAE and std
-model = "MEGNet"
+model = "CHGNet"
 
 
 # %% matplotlib
 
@@ -18,37 +18,37 @@
   },
   "devDependencies": {
     "@iconify/svelte": "^3.1.4",
-    "@rollup/plugin-yaml": "^4.1.1",
+    "@rollup/plugin-yaml": "^4.1.2",
     "@sveltejs/adapter-static": "^2.0.3",
-    "@sveltejs/kit": "^1.25.0",
-    "@sveltejs/vite-plugin-svelte": "^2.4.5",
-    "@typescript-eslint/eslint-plugin": "^6.7.0",
-    "@typescript-eslint/parser": "^6.7.0",
+    "@sveltejs/kit": "^1.27.1",
+    "@sveltejs/vite-plugin-svelte": "^2.4.6",
+    "@typescript-eslint/eslint-plugin": "^6.9.0",
+    "@typescript-eslint/parser": "^6.9.0",
     "d3-scale-chromatic": "^3.0.0",
     "elementari": "^0.2.2",
-    "eslint": "^8.49.0",
-    "eslint-plugin-svelte": "^2.33.1",
+    "eslint": "^8.52.0",
+    "eslint-plugin-svelte": "^2.34.0",
     "hastscript": "^8.0.0",
-    "highlight.js": "^11.8.0",
+    "highlight.js": "^11.9.0",
     "js-yaml": "^4.1.0",
-    "katex": "^0.16.8",
+    "katex": "^0.16.9",
     "mdsvex": "^0.11.0",
     "prettier": "^3.0.3",
     "prettier-plugin-svelte": "^3.0.3",
     "rehype-autolink-headings": "^7.0.0",
     "rehype-katex-svelte": "^1.2.0",
     "rehype-slug": "^6.0.0",
     "remark-math": "3.0.0",
-    "svelte": "^4.2.0",
-    "svelte-check": "^3.5.1",
-    "svelte-multiselect": "^10.1.0",
+    "svelte": "^4.2.2",
+    "svelte-check": "^3.5.2",
+    "svelte-multiselect": "^10.2.0",
     "svelte-preprocess": "^5.0.4",
     "svelte-toc": "^0.5.6",
     "svelte-zoo": "^0.4.9",
-    "svelte2tsx": "^0.6.21",
+    "svelte2tsx": "^0.6.23",
     "tslib": "^2.6.2",
     "typescript": "5.2.2",
-    "vite": "^4.4.9"
+    "vite": "^4.5.0"
   },
   "prettier": {
     "semi": false,
 
@@ -409,7 +409,7 @@ A material is classified as stable if the predicted $E_\text{above hull}$ lies b
 <RocModels />
 {/if}
 
-> @label:fig:roc-models Receiver operating characteristic (ROC) curve for each model. TPR/FPR = true/false positive rate. FPR on the $x$-axis is the fraction of unstable structures classified as stable. TPR on the $y$-axis is the fraction of stable structures classified as stable. The stability threshold $t$ sweeps from $-0.4 \ \frac{\text{eV}}{\text{atom}} \leq t \leq 0.4 \ \frac{\text{eV}}{\text{atom}}$ above the hull.
+> @label:fig:roc-models Receiver operating characteristic (ROC) curve for each model. TPR/FPR = true/false positive rate. FPR on the $x$-axis is the fraction of unstable structures classified as stable. TPR on the $y$-axis is the fraction of stable structures classified as stable.
 
 ### Parity Plots
Original file line number	Diff line number	Diff line change
`@@ -237,7 +237,7 @@`
`237`	`237`	`textangle=-90,`
`238`	`238`	`**axis_titles,`
`239`	`239`	`)`
`240`		`-fig.layout.height = 200 * n_rows`
	`240`	`+fig.layout.height = 230 * n_rows`
`241`	`241`	`fig.layout.coloraxis.colorbar.update(orientation="h", thickness=9, len=0.5, y=1.05)`
`242`	`242`	`# fig.layout.width = 1100`
`243`	`243`	`fig.layout.margin.update(l=40, r=10, t=30, b=60)`