You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site/src/routes/preprint/+page.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -235,7 +235,7 @@ The results for M3GNet and MACE depart from the trend that F1 is rank-correlated
235
235
Of all models, M3GNet achieves the highest true positive rate (TPR) but an unusually low true negative rate (TNR).
236
236
A similar trend is seen for MACE. @fig:rolling-mae-vs-hull-dist-models provides a visual understanding of this observation.
237
237
M3GNet and MACE have the lowest rolling mean of the absolute errors (rolling MAE) as a function of hull distance for materials above the convex hull (see right half of plot) but incur comparably large errors for materials below the hull (left half of plot).
238
-
Since $\text{TPR} = \frac{\text{TN}}{\text{TN} + \text{FP}}$, lower error for energies above the hull increases both TN and decreases FP, resulting in the high TPR values observed.
238
+
Since $\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}$, lower error for energies above the hull increases both TN and decreases FP, resulting in the high TPR values observed.
239
239
240
240
The reason CGCNN+P achieves better regression metrics than CGCNN but is still worse as a classifier becomes apparent from @fig:hist-clf-pred-hull-dist-models by noting that the CGCNN+P histogram is more sharply peaked at the 0 hull distance stability threshold.
241
241
This causes even small errors in the predicted convex hull distance to be large enough to invert a classification.
@@ -449,7 +449,7 @@ BOWSR has the largest median error, while Voronoi RF has the largest IQR. Note t
449
449
450
450
> @label:fig:hist-clf-pred-hull-dist-models Distribution of model-predicted hull distance colored by stability classification. Models are sorted from top to bottom by F1 score. The thickness of the red and yellow bands shows how often models misclassify as a function of how far away from the convex hull they place a material. While CHGNet and M3GNet perform almost equally well overall, these plots reveal that they do so via different trade-offs. M3GNet commits fewer false negatives but more false positives predictions compared to CHGNet. In a real discovery campaign, false positives have a higher opportunity cost than false negatives since they result in wasted DFT relaxations or even synthesis time in the lab. A false negative by contrast is just one missed opportunity out of many. This observation is also reflected in the higher TPR and lower TNR of M3GNet vs CHGNet in @fig:metrics-table, as well as the lower rolling MAE for CHGNet vs M3GNet on the stable side (left half) of @fig:rolling-mae-vs-hull-dist-models and vice-versa on the unstable side (right half).
451
451
452
-
Note the CGCNN+P histogram is more strongly peaked than CGCNN's which agrees better with the actual DFT ground truth [distribution of hull distances](/about-the-data#--target-distribution) in our test set. This explains why CGCNN+P performs better as a regressor, but also reveals how it can perform simultaneously worse as a classifier. By moving predictions closer to the stability threshold at 0 eV/atom above the hull, even small errors are significant enough to tip a classification over the threshold.
452
+
Note the CGCNN+P histogram is more strongly peaked than CGCNN's which agrees better with the actual DFT ground truth [distribution of hull distances](/data#--target-distribution) in our test set. This explains why CGCNN+P performs better as a regressor, but also reveals how it can perform simultaneously worse as a classifier. By moving predictions closer to the stability threshold at 0 eV/atom above the hull, even small errors are significant enough to tip a classification over the threshold.
453
453
454
454
## Measuring extrapolation performance from WBM batch robustness
0 commit comments