-
Notifications
You must be signed in to change notification settings - Fork 42
Description
I find geo-opt task's RMSD (MbdKey.structure_rmsd_vs_dft
) is currently capped by 0.3, which corresponds to default parameter stol=0.3
for pymatgen's StructureMatcher. StructureMatcher.get_rms_dist
returns None if the matched site distance is larger than stol
(ref).
This affects on the final geo-opt metrics. For example, if we fill NA with 0.3 for MACE-MPA-0's result, its mean RMSD changes from 0.014 to 0.032.
from matbench_discovery.enums import MbdKey
# https://figshare.com/files/52305104
df_geoopt_mpa0 = pd.read_csv('2025-01-30-wbm-IS2RE-FIRE-symprec=1e-2-moyo=0.3.3.csv.gz')
mean_rms_dist = df_geoopt_mpa0[MbdKey.structure_rmsd_vs_dft].mean(skipna=True)
mean_rms_dist_filled = df_geoopt_mpa0[MbdKey.structure_rmsd_vs_dft].fillna(0.3).mean(skipna=True)
print("NA count:", df_geoopt_mpa0[MbdKey.structure_rmsd_vs_dft].isna().sum())
print(f"{mean_rms_dist=:.3f}")
print(f"{mean_rms_dist_filled=:.3f}")
# NA count: 15541
# mean_rms_dist=0.014
# mean_rms_dist_filled=0.032
Also, I notice that the default parameter is set StructureMatcher(scale=True)
, which tries to scale a cell. In the geo-opt purpose, it would be better to set scale=False
for exact matching.
Therefore, I suggest two options
- Setting
stol
to be sufficiently large likeStructureMatcher(stol=1.0, scale=False)
- Setting
StructureMatcher(scale=False)
and filling NA with the default stol 0.3
structure_matcher = StructureMatcher() |
Finally, Structure.get_rms_dist
returns a distance between two structures normalized by their volumes, so I think its unit is unitless not Angstrom.