Skip to content

Update site to show ALIGNN results #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions matbench_discovery/plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ def unit(text: str) -> str:
)
model_labels = dict(
alignn="ALIGNN",
alignn_pretrained="ALIGNN Pretrained",
bowsr_megnet="BOWSR + MEGNet",
chgnet="CHGNet",
chgnet_megnet="CHGNet + MEGNet",
Expand Down
26 changes: 13 additions & 13 deletions matbench_discovery/preds.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,35 +33,35 @@ class PredFiles(Files):
"""

# BOWSR optimizer coupled with original megnet
bowsr_megnet = "bowsr/2023-01-23-bowsr-megnet-wbm-IS2RE.csv"
bowsr_megnet = "bowsr/2023-01-23-bowsr-megnet-wbm-IS2RE.csv.gz"
# default CHGNet model from publication with 400,438 params
chgnet = "chgnet/2023-03-06-chgnet-wbm-IS2RE.csv"
chgnet = "chgnet/2023-03-06-chgnet-wbm-IS2RE.csv.gz"

# CGCnn 10-member ensemble
cgcnn = "cgcnn/2023-01-26-test-cgcnn-wbm-IS2RE/cgcnn-ensemble-preds.csv"
cgcnn = "cgcnn/2023-01-26-cgcnn-ens=10-wbm-IS2RE.csv.gz"
# CGCnn 10-member ensemble with 5-fold training set perturbations
cgcnn_p = "cgcnn/2023-02-05-cgcnn-perturb=5.csv"
cgcnn_p = "cgcnn/2023-02-05-cgcnn-perturb=5-wbm-IS2RE.csv.gz"

# original M3GNet straight from publication, not re-trained
m3gnet = "m3gnet/2022-10-31-m3gnet-wbm-IS2RE.csv"
# m3gnet_direct = "m3gnet/2023-05-30-m3gnet-direct-wbm-IS2RE.csv"
# m3gnet_ms = "m3gnet/2023-06-01-m3gnet-manual-sampling-wbm-IS2RE.csv"
m3gnet = "m3gnet/2022-10-31-m3gnet-wbm-IS2RE.csv.gz"
# m3gnet_direct = "m3gnet/2023-05-30-m3gnet-direct-wbm-IS2RE.csv.gz"
# m3gnet_ms = "m3gnet/2023-06-01-m3gnet-manual-sampling-wbm-IS2RE.csv.gz"

# original MEGNet straight from publication, not re-trained
megnet = "megnet/2022-11-18-megnet-wbm-IS2RE/megnet-e-form-preds.csv"
megnet = "megnet/2022-11-18-megnet-wbm-IS2RE.csv.gz"
# CHGNet-relaxed structures fed into MEGNet for formation energy prediction
# chgnet_megnet = "chgnet/2023-03-04-chgnet-wbm-IS2RE.csv"
# chgnet_megnet = "chgnet/2023-03-04-chgnet-wbm-IS2RE.csv.gz"
# M3GNet-relaxed structures fed into MEGNet for formation energy prediction
# m3gnet_megnet = "m3gnet/2022-10-31-m3gnet-wbm-IS2RE.csv"
# m3gnet_megnet = "m3gnet/2022-10-31-m3gnet-wbm-IS2RE.csv.gz"

# Magpie composition+Voronoi tessellation structure features + sklearn random forest
voronoi_rf = "voronoi/2022-11-27-train-test/e-form-preds-IS2RE.csv"
voronoi_rf = "voronoi/2022-11-27-train-test/e-form-preds-IS2RE.csv.gz"

# wrenformer 10-member ensemble
wrenformer = "wrenformer/2022-11-15-wrenformer-IS2RE-preds.csv"
wrenformer = "wrenformer/2022-11-15-wrenformer-ens=10-IS2RE-preds.csv.gz"

alignn = "alignn/2023-06-02-alignn-wbm-IS2RE.csv.gz"
alignn_pretrained = "alignn/2023-06-03-mp-e-form-alignn-wbm-IS2RE.csv.gz"
# alignn_pretrained = "alignn/2023-06-03-mp-e-form-alignn-wbm-IS2RE.csv.gz"


# model_labels remaps model keys to pretty plot labels (see Files)
Expand Down
10 changes: 7 additions & 3 deletions models/alignn/readme.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
## ALIGNN formation energy predictions on WBM test set

ALIGNN is trained using L1 loss and 1000 epochs. The model that performs best on the validation set is saved and used for predictions (requires minor adaptation of the ALIGNN source). The modifications to the ALIGNN source code are provided as patch `alignn-2023.01.10.patch`, which was applied to ALIGNN version `2023.01.10`. In addition, all Python requirements are given as `requirements.txt`.
ALIGNN is trained using L1 loss and 1000 epochs. The model that performs best on the validation set is saved and used for predictions. This required minor changes to the ALIGNN source code provided in `alignn-2023.01.10.patch`

1. Fix use without test set (see [ALIGNN #104)](https://github.com/usnistgov/alignn/issues/104#issue-1723978225). In this case, we forked a test set, but it might be better to use the entire data, as mentioned above, especially if the test set by chance contains some important outliers.
1. The `Checkpoint` handler in ALIGNN does not define a score name (see [`train.py`](https://github.com/usnistgov/alignn/blob/46334500cac9833125b3e444d65d0246e692bd61/alignn/train.py#L851)), so it will just save the last two models during training. With this patch, also the best model in terms of accuracy on the validation set is saved, which is the one used to make predictions. This is important, because I used a relatively large `n_early_stopping` in case the validation accuracy shows a double descent (see [Figure 10](https://arxiv.org/pdf/1912.02292.pdf)).

The changes in `alignn-2023.01.10.patch` were applied to ALIGNN version `2023.01.10`.

To reproduce the `alignn` package state used for this submission, run

Expand All @@ -17,5 +22,4 @@ The directory contains the following files, which must be executed in the given

1. `train_data.py`: Export Matbench Discovery training data to ALIGNN compatible format. This script outputs training data in the directory `data_train`. In addition, a small test data set is set apart and stored in the directory `data_test`
1. `train_alignn.py`: Train an ALIGNN model on previously exported data. The resulting model is stored in the directory `data-train-result`
1. `test_data.py`: Export WBM test data in ALIGNN-compatible format. The data is stored in the directory `data-test-wbm`
1. `test_alignn.py`: Test a trained ALIGNN model on the WBM data. Predictions are stored in the file `test_alignn_result.json`
1. `test_alignn.py`: Test a trained ALIGNN model on the WBM data. Generates `2023-06-03-mp-e-form-alignn-wbm-IS2RE.csv.gz`.
Loading