Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split energy corrections into structure/composition-dependent terms in entry.energy_adjustments #2730

Closed
janosh opened this issue Nov 9, 2022 · 7 comments · Fixed by #2731
Labels
feature request Request for a new feature question Questions about functionality and design choices

Comments

@janosh
Copy link
Member

janosh commented Nov 9, 2022

I noticed both the old and 2020 MP energy correction schemes are structure-dependent. Here's an example ComputedStructureEntry that when converted to ComputedEntry each copy corrected with both old and new correction scheme gives 4 different energies.

cse.json.zip

import gzip
import json

from pymatgen.entries.computed_entries import ComputedEntry, ComputedStructureEntry

with gzip.open("cse.json.zip") as f:
    cse = ComputedStructureEntry.from_dict(json.load(f))

cse_mp2020 = cse.copy()
cse_legacy = cse.copy()
ce_mp2020 = ComputedEntry.from_dict(cse.to_dict())
ce_legacy = ce_mp2020.copy()

MaterialsProject2020Compatibility().process_entry(cse_mp2020)
MaterialsProject2020Compatibility().process_entry(ce_mp2020)
MaterialsProjectCompatibility().process_entry(cse_legacy)
MaterialsProjectCompatibility().process_entry(ce_legacy)

print(f"{cse_mp2020.correction=:.4}")
print(f"{ce_mp2020.correction=:.4}")
print(f"{cse_legacy.correction=:.4}")
print(f"{ce_legacy.correction=:.4}")

print(f"{cse_mp2020.energy_adjustments=}\n")
print(f"{ce_mp2020.energy_adjustments=}\n")
print(f"{cse_legacy.energy_adjustments=}\n")
print(f"{ce_legacy.energy_adjustments=}\n")

This script prints

cse_mp2020.correction=-2.312
ce_mp2020.correction=-4.416
cse_legacy.correction=-1.973
ce_legacy.correction=-4.49


cse_mp2020.energy_adjustments=[CompositionEnergyAdjustment:
  Name: MP2020 anion correction (superoxide)
  Value: -0.644 eV
  Uncertainty: 0.030 eV
  Description: Composition-based energy adjustment (-0.161 eV/atom x 4.0 atoms)
  Generated by: MaterialsProject2020Compatibility, CompositionEnergyAdjustment:
  Name: MP2020 GGA/GGA+U mixing correction (Mn)
  Value: -1.668 eV
  Uncertainty: 0.005 eV
  Description: Composition-based energy adjustment (-1.668 eV/atom x 1.0 atoms)
  Generated by: MaterialsProject2020Compatibility]

ce_mp2020.energy_adjustments=[CompositionEnergyAdjustment:
  Name: MP2020 anion correction (oxide)
  Value: -2.748 eV
  Uncertainty: 0.008 eV
  Description: Composition-based energy adjustment (-0.687 eV/atom x 4.0 atoms)
  Generated by: MaterialsProject2020Compatibility, CompositionEnergyAdjustment:
  Name: MP2020 GGA/GGA+U mixing correction (Mn)
  Value: -1.668 eV
  Uncertainty: 0.005 eV
  Description: Composition-based energy adjustment (-1.668 eV/atom x 1.0 atoms)
  Generated by: MaterialsProject2020Compatibility]

cse_legacy.energy_adjustments=[ConstantEnergyAdjustment:
  Name: MP Anion Correction
  Value: -0.292 eV
  Uncertainty: nan eV
  Description: Constant energy adjustment (-0.292 eV)
  Generated by: MaterialsProjectCompatibility, ConstantEnergyAdjustment:
  Name: MP Advanced Correction
  Value: -1.681 eV
  Uncertainty: nan eV
  Description: Constant energy adjustment (-1.681 eV)
  Generated by: MaterialsProjectCompatibility]

ce_legacy.energy_adjustments=[ConstantEnergyAdjustment:
  Name: MP Anion Correction
  Value: -2.809 eV
  Uncertainty: nan eV
  Description: Constant energy adjustment (-2.809 eV)
  Generated by: MaterialsProjectCompatibility, ConstantEnergyAdjustment:
  Name: MP Advanced Correction
  Value: -1.681 eV
  Uncertainty: nan eV
  Description: Constant energy adjustment (-1.681 eV)
  Generated by: MaterialsProjectCompatibility]

Currently the list of energy adjustments only discerns anion corrections from MP advanced corrections.

@rkingsbury @computron Would it be possible to split this further to show structure-dependent corrections and simpler composition-only-corrections separately?

@janosh janosh added feature request Request for a new feature question Questions about functionality and design choices labels Nov 9, 2022
janosh added a commit to janosh/matbench-discovery that referenced this issue Nov 9, 2022
@rkingsbury
Copy link
Contributor

Hi @janosh , you have definitely found an interesting edge case here. We do not actually have any structure dependent corrections. Rather, some corrections depend on oxidation state, and if there is no oxidation state information available, pymatgen tries to guess based on the composition (see code block).

However, entries downloaded from the MP API have oxidation state information appended to the .data attribute that is determined using more sophisticated methods based on structure during the build process. The corrections will use that information if it's present. Is that where you got the ComputedStructureEntry?

What's happening here is that for whatever reason, the O in cse is being classified as a superoxide, whereas it's classified as an oxide in ce. A different correction is applied depending on superoxide / oxide, and that leads to the difference, at least for the MP2020 corrections. I suspect something similar is happening with the legacy ones as well, but as you can see those are harder to parse due to some technical limitations.

Can you inspect the .data["oxidation_states"] attribute of each of your entries? I think that might shed further light.

@janosh
Copy link
Member Author

janosh commented Nov 9, 2022

Is that where you got the ComputedStructureEntry?

The CSE was published in this paper.

Can you inspect the .data["oxidation_states"] attribute of each of your entries? I think that might shed further light.

The oxidation states are all empty:

cse_mp2020.data={'oxidation_states': {}}
ce_mp2020.data={'oxidation_states': {}}
cse_legacy.data={'oxidation_states': {}}
ce_legacy.data={'oxidation_states': {}}

@janosh
Copy link
Member Author

janosh commented Nov 9, 2022

Here are the full reprs
cse_mp2020=wbm-step-2-34803 ComputedStructureEntry - Mn1 O4       (MnO4)
Energy (Uncorrected)     = -29.9509  eV (-5.9902  eV/atom)
Correction               = -2.3120   eV (-0.4624  eV/atom)
Energy (Final)           = -32.2629  eV (-6.4526  eV/atom)
Energy Adjustments:
  MP2020 anion correction (superoxide): -0.6440   eV (-0.1288  eV/atom)
  MP2020 GGA/GGA+U mixing correction (Mn): -1.6680   eV (-0.3336  eV/atom)
Parameters:
  potcar_symbols         = ['PAW_PBE Mn_pv 07Sep2000', 'PAW_PBE O 08Apr2002']
  hubbards               = {'O': 0.0, 'Mn': 3.9}
  potcar_spec            = [{'titel': 'PAW_PBE Mn_pv 07Sep2000', 'hash': None}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': None}]
  is_hubbard             = True
  run_type               = GGA+U
Data:
  oxidation_states       = {}

ce_mp2020=wbm-step-2-34803 ComputedEntry - Mn1 O4       (MnO4)
Energy (Uncorrected)     = -29.9509  eV (-5.9902  eV/atom)
Correction               = -4.4160   eV (-0.8832  eV/atom)
Energy (Final)           = -34.3669  eV (-6.8734  eV/atom)
Energy Adjustments:
  MP2020 anion correction (oxide): -2.7480   eV (-0.5496  eV/atom)
  MP2020 GGA/GGA+U mixing correction (Mn): -1.6680   eV (-0.3336  eV/atom)
Parameters:
  potcar_symbols         = ['PAW_PBE Mn_pv 07Sep2000', 'PAW_PBE O 08Apr2002']
  hubbards               = {'O': 0.0, 'Mn': 3.9}
  potcar_spec            = [{'titel': 'PAW_PBE Mn_pv 07Sep2000', 'hash': None}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': None}]
  is_hubbard             = True
  run_type               = GGA+U
Data:
  oxidation_states       = {}

cse_legacy=wbm-step-2-34803 ComputedStructureEntry - Mn1 O4       (MnO4)
Energy (Uncorrected)     = -29.9509  eV (-5.9902  eV/atom)
Correction               = -1.9728   eV (-0.3946  eV/atom)
Energy (Final)           = -31.9237  eV (-6.3847  eV/atom)
Energy Adjustments:
  MP Anion Correction    : -0.2920   eV (-0.0584  eV/atom)
  MP Advanced Correction : -1.6809   eV (-0.3362  eV/atom)
Parameters:
  potcar_symbols         = ['PAW_PBE Mn_pv 07Sep2000', 'PAW_PBE O 08Apr2002']
  hubbards               = {'O': 0.0, 'Mn': 3.9}
  potcar_spec            = [{'titel': 'PAW_PBE Mn_pv 07Sep2000', 'hash': None}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': None}]
  is_hubbard             = True
  run_type               = GGA+U
Data:
  oxidation_states       = {}

ce_legacy=wbm-step-2-34803 ComputedEntry - Mn1 O4       (MnO4)
Energy (Uncorrected)     = -29.9509  eV (-5.9902  eV/atom)
Correction               = -4.4900   eV (-0.8980  eV/atom)
Energy (Final)           = -34.4409  eV (-6.8882  eV/atom)
Energy Adjustments:
  MP Anion Correction    : -2.8092   eV (-0.5618  eV/atom)
  MP Advanced Correction : -1.6809   eV (-0.3362  eV/atom)
Parameters:
  potcar_symbols         = ['PAW_PBE Mn_pv 07Sep2000', 'PAW_PBE O 08Apr2002']
  hubbards               = {'O': 0.0, 'Mn': 3.9}
  potcar_spec            = [{'titel': 'PAW_PBE Mn_pv 07Sep2000', 'hash': None}, {'titel': 'PAW_PBE O 08Apr2002', 'hash': None}]
  is_hubbard             = True
  run_type               = GGA+U
Data:
  oxidation_states       = {}

@rkingsbury
Copy link
Contributor

Thanks @janosh . So in this case the presence of the structure is triggering pymatgen to detect peroxides/superoxides (see code block )

If you were to instantiate either the legacy or MP2020 corrections with correct_peroxide=False, then you should get the same answer whether or not a structure is present. As far as I'm aware, this check is the only place in the corrections where the structure is used.

As confusing as this all is, it is intended behavior. Basically, the corrections try to utilize as much information to make the best guess they can about the oxidation states of the respective ions.

@janosh
Copy link
Member Author

janosh commented Nov 9, 2022

@rkingsbury Thanks for taking the time to look into this!

Basically, the corrections try to utilize as much information to make the best guess they can about the oxidation states of the respective ions.

That's what I figured. And you're right, if I pass correct_peroxide=False, CE and CSE corrections become equal:

cse_mp2020.correction=-4.416
ce_mp2020.correction=-4.416
cse_legacy.correction=-4.49
ce_legacy.correction=-4.49

So it doesn't make sense to split this structure-dependent part into its own energy adjustment?

janosh added a commit to janosh/matbench-discovery that referenced this issue Nov 9, 2022
janosh added a commit to janosh/matbench-discovery that referenced this issue Nov 9, 2022
janosh added a commit to janosh/matbench-discovery that referenced this issue Nov 9, 2022
@rkingsbury
Copy link
Contributor

So it doesn't make sense to split this structure-dependent part into its own energy adjustment?

Correct. Our correction scheme is really not "structure based" (with this one small exception) and the correction that is applied is still a composition-based anion correction. The structure just happens to be used in this one case to classify the anion as a superoxide/peroxide/oxide. So I think it could be misleading to separate out a "structure-dependent" energy adjustment. However if you think edits to any of the associated docstrings or documentation are in order to make this clearer, I'd be supportive of that.

@janosh
Copy link
Member Author

janosh commented Nov 9, 2022

The structure just happens to be used in this one case to classify the anion as a superoxide/peroxide/oxide.

It's also used to classify sulfides vs polysulfides.

I'll add a paragraph to the MaterialsProject2020Compatibility doc string to highlight this corner-case structure dependence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature question Questions about functionality and design choices
Projects
None yet
2 participants