Skip to content

Cache Lattice property (lengths/angles/volume) for much faster Structure.as_dict #4421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 30, 2025

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented May 29, 2025

Summary

  • lengths/angles/volume of Lattice would now be cached, related to orjson for much faster JSON operations? #4385
  • verbosity in as_dict of PeriodicSite/Lattice now explicitly requires literal 0 or 1 to be consistent with docstring, instead of checking if verbosity > 0 (currently in grace period, only warning issued)
    verbosity (int): Default of 0 only includes the matrix representation.
    Set to 1 to include the lattice parameters.

Cache frequently used Lattice properties

Currently length/angles/volume is not cached and is frequently used, for example accessing all lattice parameter related property would lead to length/angles being repeatedly calculated:

@property
def a(self) -> float:
"""*a* lattice parameter."""
return self.lengths[0]
@property
def b(self) -> float:
"""*b* lattice parameter."""
return self.lengths[1]
@property
def c(self) -> float:
"""*c* lattice parameter."""
return self.lengths[2]
@property
def abc(self) -> tuple[float, float, float]:
"""Lengths of the lattice vectors, i.e. (a, b, c)."""
return self.lengths
@property
def alpha(self) -> float:
"""Angle alpha of lattice in degrees."""
return self.angles[0]
@property
def beta(self) -> float:
"""Angle beta of lattice in degrees."""
return self.angles[1]
@property
def gamma(self) -> float:
"""Angle gamma of lattice in degrees."""
return self.angles[2]
@property
def volume(self) -> float:
"""Volume of the unit cell in Angstrom^3."""
matrix = self._matrix
return float(abs(np.dot(np.cross(matrix[0], matrix[1]), matrix[2])))
@property
def parameters(self) -> tuple[float, float, float, float, float, float]:
"""6-tuple of floats (a, b, c, alpha, beta, gamma)."""
return (*self.lengths, *self.angles)
@property
def params_dict(self) -> dict[str, float]:
"""Dictionary of lattice parameters."""
return dict(zip("a b c alpha beta gamma".split(), self.parameters, strict=True))

structure.as_dict now around 8x faster

Before (1000 structure, each has 10-100 atoms):

Total time: 3.29617 s
File: create_dummp_json_structure.py
Function: generate_and_save_structures at line 34

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    34                                           @profile
    35                                           def generate_and_save_structures(n, output_dir):
    36         1         20.0     20.0      0.0      os.makedirs(output_dir, exist_ok=True)
    37                                           
    38      1001        222.0      0.2      0.0      for i in range(n):
    39      1000     291789.0    291.8      8.9          structure = generate_dummy_structure()
    40      1000        583.0      0.6      0.0          filename = f"structure_{i:04d}.json.gz"
    41      1000       1942.0      1.9      0.1          filepath = os.path.join(output_dir, filename)
    42                                           
    43      2000     224549.0    112.3      6.8          with gzip.open(filepath, "wb") as f:
    44      1000    2761900.0   2761.9     83.8              dct = structure.as_dict()
    45      1000      15163.0     15.2      0.5              f.write(orjson.dumps(dct))

Now:

Total time: 0.949622 s
File: create_dummp_json_structure.py
Function: generate_and_save_structures at line 34

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    34                                           @profile
    35                                           def generate_and_save_structures(n, output_dir):
    36         1         37.0     37.0      0.0      os.makedirs(output_dir, exist_ok=True)
    37                                           
    38      1001        195.0      0.2      0.0      for i in range(n):
    39      1000     286696.0    286.7     30.2          structure = generate_dummy_structure()
    40      1000        511.0      0.5      0.1          filename = f"structure_{i:04d}.json.gz"
    41      1000       1843.0      1.8      0.2          filepath = os.path.join(output_dir, filename)
    42                                           
    43      2000     214130.0    107.1     22.5          with gzip.open(filepath, "wb") as f:
    44      1000     431677.0    431.7     45.5              dct = structure.as_dict()
    45      1000      14533.0     14.5      1.5              f.write(orjson.dumps(dct))

Also note lattice (the performance bottleneck) is not used in the dict for site:

site_dict = site.as_dict(verbosity=verbosity)
del site_dict["lattice"]

So we could modify as_dict to control whether lattice would be generated at all

This could reduce the runtime slightly so I guess it's not worth the effort:

Total time: 0.867376 s

@DanielYang59 DanielYang59 changed the title Cache lattice property Cache Lattice property: lengths/angles/volume May 29, 2025
@DanielYang59 DanielYang59 force-pushed the cache-lattice-property branch from 3f11e41 to 108fee6 Compare May 29, 2025 17:00
@DanielYang59 DanielYang59 changed the title Cache Lattice property: lengths/angles/volume Cache Lattice property (lengths/angles/volume) for much faster Structure.as_dict May 29, 2025
@DanielYang59 DanielYang59 marked this pull request as ready for review May 30, 2025 09:44
@shyuep
Copy link
Member

shyuep commented May 30, 2025

There is no need to put in a setter. The Lattice object is meant to be immutable like. There is no setter today and you cannot modify a matrix (without some weird hacky code like modifying variables indicated as private by convention).

@DanielYang59
Copy link
Contributor Author

DanielYang59 commented May 30, 2025

Good point! having matrix immutable makes caching much easier :), I didn't realize that

@shyuep shyuep merged commit 005613c into materialsproject:master May 30, 2025
43 checks passed
@DanielYang59 DanielYang59 deleted the cache-lattice-property branch May 30, 2025 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants