Skip to content

Commit 18be9dc

Browse files
committed
pnpm add -D katex remark-math for equation support in docs
1 parent 7988f52 commit 18be9dc

File tree

8 files changed

+78
-17
lines changed

8 files changed

+78
-17
lines changed

data/mp/get_mp_energies.py

+33-3
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from aviary.utils import as_dict_handler
66
from aviary.wren.utils import get_aflow_label_from_spglib
77
from mp_api.client import MPRester
8+
from pymatviz import density_scatter
89
from tqdm import tqdm
910

1011
from matbench_discovery import today
@@ -19,7 +20,6 @@
1920
__author__ = "Janosh Riebesell"
2021
__date__ = "2022-08-13"
2122

22-
2323
module_dir = os.path.dirname(__file__)
2424

2525

@@ -33,22 +33,52 @@
3333
"structure",
3434
"symmetry",
3535
"energy_above_hull",
36+
"decomposition_enthalpy",
37+
"energy_type",
3638
]
39+
3740
with MPRester(use_document_model=False) as mpr:
38-
docs = mpr.summary.search(fields=fields)
41+
docs = mpr.thermo.search(fields=fields, thermo_types=["GGA_GGA+U"])
3942

4043
print(f"{today}: {len(docs) = :,}")
4144
# 2022-08-13: len(docs) = 146,323
45+
# 2023-01-10: len(docs) = 154,718
4246

4347

4448
# %%
4549
df = pd.DataFrame(docs).set_index("material_id")
4650
df.pop("_id")
4751

48-
df["spacegroup_number"] = df.pop("symmetry").map(lambda x: x.number)
52+
df.energy_type.value_counts().plot.pie(backend="matplotlib", autopct="%1.1f%%")
53+
54+
55+
# %%
56+
df["spacegroup_number"] = df.pop("symmetry").map(lambda x: x["number"])
4957

5058
df["wyckoff_spglib"] = [get_aflow_label_from_spglib(x) for x in tqdm(df.structure)]
5159

5260
df.to_json(f"{module_dir}/{today}-mp-energies.json.gz", default_handler=as_dict_handler)
5361

5462
# df = pd.read_json(f"{module_dir}/2022-08-13-mp-energies.json.gz")
63+
64+
65+
# %% reproduce fig. 1b from https://arxiv.org/abs/2001.10591 (as data consistency check)
66+
ax = df.plot.scatter(
67+
x="formation_energy_per_atom",
68+
y="decomposition_enthalpy",
69+
alpha=0.1,
70+
backend="matplotlib",
71+
xlim=[-5, 1],
72+
ylim=[-1, 1],
73+
color=df.decomposition_enthalpy.map(lambda x: "red" if x > 0 else "blue"),
74+
title=f"{today} - {len(df):,} MP entries",
75+
)
76+
# result on 2023-01-10: plots match. no correlation between formation energy and decomposition
77+
# enthalpy. R^2 = -1.571, MAE = 1.604
78+
ax.figure.savefig(f"{module_dir}/{today}-mp-decomp-enth-vs-e-form.png", dpi=300)
79+
80+
ax = density_scatter(
81+
df.formation_energy_per_atom,
82+
df.decomposition_enthalpy,
83+
)
84+
ax.set(xlim=[-5, 1], ylim=[-1, 1])
File renamed without changes.

data/wbm/readme.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# WBM Dataset
22

3-
The **WBM dataset** was published in [Predicting stable crystalline compounds using chemical similarity][wbm paper] (Nature Computational Materials, Jan 2021, [doi:10.1038/s41524-020-00481-6](http://doi.org/10.1038/s41524-020-00481-6)). The authors generated 257,487 structures through single-element substitutions on Materials Project (MP) source structures. The replacement element was chosen based on chemical similarity determined by a matrix data-mined from the [Inorganic Crystal Structure Database (ICSD)](https://icsd.products.fiz-karlsruhe.de).
3+
The **WBM dataset** was published in [Predicting stable crystalline compounds using chemical similarity][wbm paper] (Nature Computational Materials, Jan 2021, [doi:10.1038/s41524-020-00481-6](http://doi.org/10.1038/s41524-020-00481-6)). The authors generated 257,487 structures through single-element substitutions on Materials Project (MP) source structures. The replacement element was chosen based on chemical similarity determined by a matrix data mined from the [Inorganic Crystal Structure Database (ICSD)](https://icsd.products.fiz-karlsruhe.de).
44

5-
The resulting novel structures were relaxed using MP-compatible VASP inputs (i.e. using `pymatgen`'s `MPRelaxSet`) and identical POTCARs in an attempt to create a database of Materials Project compatible novel crystals. Any degrade in model performance from training to test set should therefore largely be a result of extrapolation error rather than covariate shift in the underlying data.
5+
The resulting novel structures were relaxed using MP-compatible VASP inputs (i.e. using `pymatgen`'s `MPRelaxSet`) and identical POTCARs in an attempt to create a database of Materials Project compatible novel crystals. Any degradation in model performance from training to test set should therefore largely be a result of extrapolation error rather than covariate shift in the underlying data.
66

77
The authors performed 5 rounds of elemental substitution in total, each time relaxing all generated structures and adding those found to lie on the convex hull back to the source pool. In total, ~20k or close to 10% were found to lie on the Materials Project convex hull.
88

9-
Since repeated substitutions should - on average - increase chemical dissimilarity, the 5 iterations of this data-generation process are a unique and compelling feature as it allows out-of distribution testing. We can check how model performance degrades when asked to predict on structures increasingly more dissimilar from the training set (which is restricted to the MP 2022 database release (or earlier) for all models in this benchmark).
9+
Since repeated substitutions should - on average - increase chemical dissimilarity, the 5 iterations of this data-generation process are a unique and compelling feature as it allows out-of-distribution testing. We can check how model performance degrades when asked to predict structures increasingly more dissimilar from the training set (which is restricted to the MP 2022 database release (or earlier) for all models in this benchmark).
1010

1111
## 🆔   About the IDs
1212

@@ -70,7 +70,7 @@ materialscloud:2021.68 includes a readme file with a description of the dataset,
7070

7171
[wbm paper]: https://nature.com/articles/s41524-020-00481-6
7272

73-
## 📊   Data Plots
73+
## 📊   Plots
7474

7575
<caption>Heatmap of WBM training set element counts</caption>
7676
<slot name="wbm-elements-heatmap">

site/package.json

+5
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,18 @@
2525
"@typescript-eslint/parser": "^5.48.0",
2626
"eslint": "^8.31.0",
2727
"eslint-plugin-svelte3": "^4.0.0",
28+
"hast-util-from-string": "^2.0.0",
29+
"hast-util-select": "^5.0.3",
30+
"hast-util-to-string": "^2.0.0",
2831
"hastscript": "^7.2.0",
2932
"highlight.js": "^11.7.0",
33+
"katex": "^0.16.4",
3034
"mdsvex": "^0.10.6",
3135
"prettier": "^2.8.2",
3236
"prettier-plugin-svelte": "^2.9.0",
3337
"rehype-autolink-headings": "^6.1.1",
3438
"rehype-slug": "^5.1.0",
39+
"remark-math": "^3.0.0",
3540
"svelte": "^3.55.0",
3641
"svelte-check": "^3.0.1",
3742
"svelte-github-corner": "^0.2.0",

site/src/app.css

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
--toc-li-padding: 4pt 1ex;
88
--toc-mobile-btn-color: white;
99
--toc-desktop-nav-margin: 0 0 0 1em;
10-
--toc-min-width: 20em;
10+
--toc-min-width: 16em;
1111
--toc-active-bg: darkcyan;
1212

1313
--ghc-color: var(--night);

site/src/app.html

+6
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,13 @@
2525

2626
<link rel="icon" href="/favicon.svg" />
2727
<link rel="stylesheet" href="/prism-vsc-dark-plus.css" />
28+
<!-- interactive plots -->
2829
<script src="https://cdn.plot.ly/plotly-2.14.0.min.js"></script>
30+
<!-- math display -->
31+
<link
32+
rel="stylesheet"
33+
href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css"
34+
/>
2935

3036
%sveltekit.head%
3137
</head>

site/src/routes/how-to-contribute/+page.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## 🔨 &thinsp; Installation
44

5-
The recommended way to acquire the train and test data for this benchmark is through its Python package [available onPyPI](https://pypi.org/project/matbench-discovery):
5+
The recommended way to acquire the train and test data for this benchmark is through its Python package [available on PyPI](https://pypi.org/project/matbench-discovery):
66

77
```zsh
88
pip install matbench-discovery
@@ -124,7 +124,7 @@ To deploy a new model on this benchmark and add it to our leaderboard, please cr
124124
125125
Arbitrary other keys can be added as needed.
126126
127-
Please see any of subdirectories in [`models/`](https://github.com/janosh/matbench-discovery/tree/main/models) for example submissions. More detailed step-by-step instructions below:
127+
Please see any of the subdirectories in `models`/`](<https://github.com/janosh/matbench-discovery/tree/main/models>) for example submissions. More detailed step-by-step instructions below:
128128

129129
### Step 1: Clone the repo
130130

@@ -142,7 +142,7 @@ Create a new folder
142142
mkdir models/<model_name>
143143
```
144144

145-
and place the above listed files there. The file structure should look like this:
145+
and place the above-listed files there. The file structure should look like this:
146146

147147
```txt
148148
matbench-discovery-root
@@ -155,7 +155,7 @@ matbench-discovery-root
155155
└── train_<model_name>.py # optional
156156
```
157157

158-
You can include arbitrary other supporting files like metadata, model features (below 10MB to keep `git clone` time low) if they are needed to run the model or help others reproduce your results. For larger files, please upload to [Figshare](https://figshare.com) or similar and link them somewhere in your files.
158+
You can include arbitrary other supporting files like metadata and model features (below 10MB to keep `git clone` time low) if they are needed to run the model or help others reproduce your results. For larger files, please upload to [Figshare](https://figshare.com) or similar and link them somewhere in your files.
159159

160160
### Step 3: Create a PR to the [Matbench Discovery repo](https://github.com/janosh/matbench-discovery)
161161

@@ -168,6 +168,6 @@ git commit -m 'add <model_name> to Matbench Discovery leaderboard'
168168

169169
And you're done! Once tests pass and the PR is merged, your model will be added to the leaderboard! 🎉
170170

171-
## Troubleshooting
171+
## 😵‍💫 &thinsp; Troubleshooting
172172

173173
Having problems using or contributing to the project? Please [open an issue on GitHub](https://github.com/janosh/matbench-discovery/issues). We're happy to help!

site/svelte.config.js

+24-4
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,31 @@
11
import adapter from '@sveltejs/adapter-static'
2+
import { fromString } from 'hast-util-from-string'
3+
import { selectAll } from 'hast-util-select'
4+
import { toString } from 'hast-util-to-string'
25
import { s } from 'hastscript'
6+
import katex from 'katex'
37
import { mdsvex } from 'mdsvex'
4-
import linkHeadings from 'rehype-autolink-headings'
5-
import headingSlugs from 'rehype-slug'
8+
import link_headings from 'rehype-autolink-headings'
9+
import heading_slugs from 'rehype-slug'
10+
import math from 'remark-math'
611
import preprocess from 'svelte-preprocess'
712

813
const rehypePlugins = [
9-
headingSlugs,
14+
// from https://github.com/kwshi/rehype-katex-svelte
15+
(options = {}) =>
16+
(tree) => {
17+
for (const node of selectAll(`.math-inline,.math-display`, tree)) {
18+
const displayMode = node.properties?.className?.includes(`math-display`)
19+
const rendered = katex.renderToString(toString(node), {
20+
...options,
21+
displayMode,
22+
})
23+
fromString(node, `{@html ${JSON.stringify(rendered)}}`)
24+
}
25+
},
26+
heading_slugs,
1027
[
11-
linkHeadings,
28+
link_headings,
1229
{
1330
behavior: `append`,
1431
test: [`h2`, `h3`, `h4`, `h5`, `h6`], // don't auto-link <h1>
@@ -30,6 +47,9 @@ export default {
3047
preprocess(),
3148
mdsvex({
3249
rehypePlugins,
50+
// [email protected] pinned due to mdsvex, see
51+
// https://github.com/kwshi/rehype-katex-svelte#usage
52+
remarkPlugins: [math],
3353
extensions: [`.svx`, `.md`],
3454
}),
3555
],

0 commit comments

Comments
 (0)