|
1 | 1 | <script lang="ts">
|
2 |
| - import MetricsTableMegnetCombos from '$figs/metrics-table-megnet-combos.svelte' |
| 2 | + import MetricsTableMegnetUipCombos from '$figs/metrics-table-megnet-uip-combos.svelte' |
3 | 3 | import MetricsTableFirst10k from '$figs/metrics-table-first-10k.svelte'
|
4 | 4 | import RunTimeBars from '$figs/model-run-times-bar.svelte'
|
5 | 5 | import RocModels from '$figs/roc-models.svelte'
|
|
14 | 14 | import HistClfPredHullDistModels from '$figs/hist-clf-pred-hull-dist-models-4x2.svelte'
|
15 | 15 | import SpacegroupSunburstWbm from '$figs/spacegroup-sunburst-wbm.svelte'
|
16 | 16 | import SpacegroupSunburstWrenformerFailures from '$figs/spacegroup-sunburst-wrenformer-failures.svelte'
|
17 |
| - import ScatterLargestErrorsModelsMeanVsEachTrue from '$figs/scatter-largest-errors-models-mean-vs-each-true.svelte' |
| 17 | + import ScatterLargestErrorsModelsMeanVsTrueHullDist from '$figs/scatter-largest-errors-models-mean-vs-true-hull-dist.svelte' |
18 | 18 | import EAboveHullScatterWrenformerFailures from '$figs/e-above-hull-scatter-wrenformer-failures.svelte'
|
19 | 19 | import ProtoCountsWrenformerFailures from '$figs/proto-counts-wrenformer-failures.svelte'
|
20 | 20 | import ElementPrevalenceVsError from '$figs/element-prevalence-vs-error.svelte'
|
@@ -99,19 +99,19 @@ Given its strong performance on batch 1, it is possible that given sufficiently
|
99 | 99 | ## Largest Errors vs DFT Hull Distance
|
100 | 100 |
|
101 | 101 | {#if mounted}
|
102 |
| -<ScatterLargestErrorsModelsMeanVsEachTrue /> |
| 102 | +<ScatterLargestErrorsModelsMeanVsTrueHullDist /> |
103 | 103 | {/if}
|
104 | 104 |
|
105 |
| -> @label:fig:scatter-largest-errors-models-mean-vs-each-true The 200 structures with largest error averaged over all models vs their DFT hull distance colored by model disagreement (as measured by standard deviation in hull distance predictions from different models) and sized by number of training structures containing the least prevalent element (e.g. if a scatter point had composition FeO, MP has 6.6k structures containing Fe and 82k containing O so its size would be set to 6.6k). Thus smaller points have less training support. This plot suggests all models are biased to predict low energy and perhaps fail to capture certain physics resulting in highly unstable structures. This is unsurprising considering MP training data mainly consists of low energy structures.<br> |
106 |
| -> It is also possible that some of the blue points with large error yet good agreement among models are in fact accurate ML predictions for a DFT relaxation gone wrong. |
| 105 | +> @label:fig:scatter-largest-errors-models-mean-vs-true-hull-dist DFT vs predicted hull distance (average over all models) for the 200 largest error structures colored by model disagreement (as measured by standard deviation in hull distance predictions from different models) and sized by number of atoms in the structures. This plot shows that high-error predictions are biased towards predicting too small hull distance. This is unsurprising considering MP training data mainly consists of low-energy structures.<br> |
| 106 | +> However, note the clear color separation between the mostly blue low-energy-bias predictions and the yellow/red high error prediction. Blue means models are in good agreement, i.e. all models are "wrong" together. Red/yellow are large-error predictions with little model agreement, i.e. all models are wrong in different ways. It is possible that some of the blue points with large error yet good agreement among models are in fact accurate ML predictions for a DFT relaxation gone wrong. Zooming in on the blue points reveals that many of them are large. Larger markers correspond to larger structures where DFT failures are less surprising. This suggests ML model committees could be used to cheaply screen large databases for DFT errors in a high-throughput manner. |
107 | 107 |
|
108 | 108 | ## MEGNet formation energies from UIP-relaxed structures
|
109 | 109 |
|
110 | 110 | {#if mounted}
|
111 |
| -<MetricsTableMegnetCombos select={[`model`, `MEGNet`, `CHGNet`, `M3GNet`, `CHGNet + MEGNet`, `M3GNet + MEGNet`]} /> |
| 111 | +<MetricsTableMegnetUipCombos select={[`model`, `MEGNet`, `CHGNet`, `M3GNet`, `CHGNet + MEGNet`, `M3GNet + MEGNet`]} /> |
112 | 112 | {/if}
|
113 | 113 |
|
114 |
| -> @label:fig:metrics-table-megnet-combos This table shows metrics obtained by combining MEGNet with both UIPs. The metrics in rows labeled M3GNet + MEGNet and CHGNet + MEGNet are the result of passing M3GNet/CHGNet-relaxed structures into MEGNet for formation energy prediction. Both combos perform worse than using the respective UIPs on their own with a more pronounced performance drop from CHGNet to CHGNet + MEGNet than M3GNet to M3GNet + MEGnet. This suggests MEGNet has learned no additional knowledge of the PES that is not already present in the UIPs. However, both combos perform better than MEGNet on its own, demonstrating that UIP relaxation provides real utility at very low cost for any downstream structure-dependent analysis. |
| 114 | +> @label:fig:metrics-table-megnet-uip-combos This table shows metrics obtained by combining MEGNet with both UIPs. The metrics in rows labeled M3GNet + MEGNet and CHGNet + MEGNet are the result of passing M3GNet/CHGNet-relaxed structures into MEGNet for formation energy prediction. Both combos perform worse than using the respective UIPs on their own with a more pronounced performance drop from CHGNet to CHGNet + MEGNet than M3GNet to M3GNet + MEGnet. This suggests MEGNet has learned no additional knowledge of the PES that is not already present in the UIPs. However, both combos perform better than MEGNet on its own, demonstrating that UIP relaxation provides real utility at very low cost for any downstream structure-dependent analysis. |
115 | 115 |
|
116 | 116 | The UIPs M3GNet and CHGNet are both trained to predict DFT energies (including/excluding MP2020 energy corrections for CHGNet/M3GNet) while MEGNet is trained to predict formation energies.
|
117 | 117 |
|
|
0 commit comments