Skip to content

Parameters for StructureMatcher impact on geo-opt metrics #230

@lan496

Description

@lan496

I find geo-opt task's RMSD (MbdKey.structure_rmsd_vs_dft) is currently capped by 0.3, which corresponds to default parameter stol=0.3 for pymatgen's StructureMatcher. StructureMatcher.get_rms_dist returns None if the matched site distance is larger than stol (ref).

This affects on the final geo-opt metrics. For example, if we fill NA with 0.3 for MACE-MPA-0's result, its mean RMSD changes from 0.014 to 0.032.

from matbench_discovery.enums import MbdKey

# https://figshare.com/files/52305104
df_geoopt_mpa0 = pd.read_csv('2025-01-30-wbm-IS2RE-FIRE-symprec=1e-2-moyo=0.3.3.csv.gz')

mean_rms_dist = df_geoopt_mpa0[MbdKey.structure_rmsd_vs_dft].mean(skipna=True)
mean_rms_dist_filled = df_geoopt_mpa0[MbdKey.structure_rmsd_vs_dft].fillna(0.3).mean(skipna=True)
print("NA count:", df_geoopt_mpa0[MbdKey.structure_rmsd_vs_dft].isna().sum())
print(f"{mean_rms_dist=:.3f}")
print(f"{mean_rms_dist_filled=:.3f}")

# NA count: 15541
# mean_rms_dist=0.014
# mean_rms_dist_filled=0.032

Also, I notice that the default parameter is set StructureMatcher(scale=True), which tries to scale a cell. In the geo-opt purpose, it would be better to set scale=False for exact matching.

Therefore, I suggest two options

  1. Setting stol to be sufficiently large like StructureMatcher(stol=1.0, scale=False)
  2. Setting StructureMatcher(scale=False) and filling NA with the default stol 0.3

structure_matcher = StructureMatcher()

Finally, Structure.get_rms_dist returns a distance between two structures normalized by their volumes, so I think its unit is unitless not Angstrom.

Metadata

Metadata

Assignees

No one assigned

    Labels

    analysisNew model analysisbugSomething isn't workinggeo optGeometry optimization

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions