Skip to content

Speedup import and add regression check for import time #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Nov 2, 2024
Merged
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b6bee81
pre-commit migrate-config
DanielYang59 Oct 20, 2024
5c8032d
avoid import Structure for type check
DanielYang59 Oct 20, 2024
212d573
lazily import scipy
DanielYang59 Oct 20, 2024
d4e25e7
avoid import Structure in utils
DanielYang59 Oct 20, 2024
865a4fa
copy helper func _check_type from monty
DanielYang59 Oct 20, 2024
82a31c4
lazy import plotly.figure_factory
DanielYang59 Oct 20, 2024
e60c770
lazy import NearNeighbors
DanielYang59 Oct 20, 2024
2777c0f
lazy import PhononDos and PhononBands
DanielYang59 Oct 20, 2024
13960b4
more lazy import Structure
DanielYang59 Oct 20, 2024
07e084c
clean up some duplicate imports
DanielYang59 Oct 20, 2024
5c01d25
lazy import sklearn
DanielYang59 Oct 20, 2024
5918aeb
lazy import pmg composition
DanielYang59 Oct 20, 2024
1161bf0
remove unused import from root __init__
DanielYang59 Oct 20, 2024
2b04381
relocate scikit learn import
DanielYang59 Oct 20, 2024
1059be9
revert hacky type check changes
DanielYang59 Oct 21, 2024
92a4821
bump monty the hard way
DanielYang59 Oct 23, 2024
23cbe32
WIP: add draft import time checker
DanielYang59 Oct 23, 2024
fe1e32d
add more test modules
DanielYang59 Oct 23, 2024
b019517
reduce default average count to 3, it seems very slow
DanielYang59 Oct 23, 2024
6bd0bb8
tweak gen ref time logic
DanielYang59 Oct 23, 2024
bf57cd0
update ref time
DanielYang59 Oct 23, 2024
acd25f9
use perf_counter over time()
DanielYang59 Oct 23, 2024
ed48b13
lazy import plotly.figure_factory, reduce 0.2s 10%
DanielYang59 Oct 23, 2024
2245483
update ref import time
DanielYang59 Oct 23, 2024
f64739a
use standard time format
DanielYang59 Oct 23, 2024
5143765
only run on main branch
DanielYang59 Oct 23, 2024
77f3f59
use warnings.warn
DanielYang59 Oct 26, 2024
b0006c8
use perf_counter_ns
DanielYang59 Oct 26, 2024
c337e54
rename measure_import_time_in_ms -> measure_import_time
janosh Nov 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 8 additions & 11 deletions tests/performance_tests/test_import_time.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,21 +43,18 @@
def test_get_ref_import_time() -> None:
"""A dummy test that would always fail, used to generate copyable reference time."""
import_times = {
module_name: measure_import_time_in_ms(module_name)
module_name: round(measure_import_time(module_name), 2)
for module_name in REF_IMPORT_TIME
}

# Print out the import times in a copyable format
print("\nCopyable import time dictionary:")
print("{")
for module_name, import_time in import_times.items():
print(f' "{module_name}": {import_time:.2f},')
Copy link
Collaborator Author

@DanielYang59 DanielYang59 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was intended to generate a directly copyable dict (and human-readable for manual inspection) for us to directly copy and paste to update the time recording, printing a dict directly gives a one-liner:

Copyable import time dictionary:
{'pymatviz': 2084.25, 'pymatviz.coordination': 2342.41, 'pymatviz.cumulative': 2299.73, 'pymatviz.histogram': 2443.11, 'pymatviz.phonons': 2235.57, 'pymatviz.powerups': 2172.71, 'pymatviz.ptable': 2286.77, 'pymatviz.rainclouds': 2702.03, 'pymatviz.rdf': 2331.98, 'pymatviz.relevance': 2256.29, 'pymatviz.sankey': 2313.12, 'pymatviz.scatter': 2312.48, 'pymatviz.structure_viz': 2330.39, 'pymatviz.sunburst': 2395.04, 'pymatviz.uncertainty': 2317.87, 'pymatviz.xrd': 2242.09}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i saw that. the thing is, if you paste that printed dict into a python file, ruff will auto-format it to multiple lines so having easier to read code seemed more important

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

print("}")
print(import_times)

pytest.fail("Generate reference import times.")


def measure_import_time_in_ms(module_name: str, count: int = 3) -> float:
def measure_import_time(module_name: str, repeats: int = 3) -> float:
"""Measure import time of a module in milliseconds across several runs.

Args:
Expand All @@ -69,12 +66,12 @@ def measure_import_time_in_ms(module_name: str, count: int = 3) -> float:
"""
total_time = 0.0

for _ in range(count):
start_time = time.perf_counter_ns()
Copy link
Collaborator Author

@DanielYang59 DanielYang59 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. using perf_counter_ns was intended to avoid float precision lost (the former gives time in int, while the latter gives in float), but admittedly for our case the error should be negligible as: 1. it's far from the ns level; 2. we're rounding it anyway

Use perf_counter_ns() to avoid the precision loss caused by the float type.

for _ in range(repeats):
start_time = time.perf_counter()
subprocess.run(["python", "-c", f"import {module_name}"], check=True) # noqa: S603, S607
total_time += time.perf_counter_ns() - start_time
total_time += time.perf_counter() - start_time

return (total_time / count) / 1e6
return total_time / repeats * 1000


@pytest.mark.skipif(
Expand All @@ -91,7 +88,7 @@ def test_import_time(grace_percent: float = 0.20, hard_percent: float = 0.50) ->
before the test fails.
"""
for module_name, ref_time in REF_IMPORT_TIME.items():
current_time = measure_import_time_in_ms(module_name)
current_time = measure_import_time(module_name)

# Calculate grace and hard thresholds
grace_threshold = ref_time * (1 + grace_percent)
Expand Down