Skip to content

Speedup import and add regression check for import time #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Nov 2, 2024

Conversation

DanielYang59
Copy link
Collaborator

@DanielYang59 DanielYang59 commented Oct 20, 2024

Summary

Profile command: python -X importtime -c "import pymatviz" 2> pmv.log && tuna pmv.log

@DanielYang59 DanielYang59 added ux User experience pkg Package labels Oct 20, 2024
@DanielYang59 DanielYang59 self-assigned this Oct 20, 2024
@DanielYang59
Copy link
Collaborator Author

DanielYang59 commented Oct 20, 2024

I have a bad feeling about this PR, I think I might prefer to wait until the import fix in monty get merged:

  • It's badly polluting the code base by lazily almost everything about pymatgen, because any import involving from monty.json import xxx could potentially slow down the import (if torch is installed).

Meanwhile for some reason every time I lazily import one package, there seems to be another package popping up from nowhere and make the total import time roughly the same. I think the reason is at the end the same costly packages are imported somewhere, and by refactoring iteratively, they're just broken into some smaller chunks (before refactor, they're all imported in the first module that needed them, then we keep relocating the current most expensive piece to another) so the most expensive chunk keep shrinking but the total import time is roughly unchanged.

Profile on main branch (without torch):

image

The tip of this branch:

image

@janosh
Copy link
Owner

janosh commented Oct 20, 2024

Thanks a lot for digging into this, really appreciated! I suspect you're right, we're dependent on upstream PRs being merged to achieve significant improvements here.

@janosh
Copy link
Owner

janosh commented Oct 22, 2024

i think it would be good to add performance regression tests to this PR and run them both on main and in this branch to compare. we should have them to catch future slow downs before they enter main. i posted some half-baked import time tests in #209 (comment) which might help. no worries if you're busy, i'll try and get to it myself eventually

@DanielYang59
Copy link
Collaborator Author

DanielYang59 commented Oct 23, 2024

I will do this :) I was planning to add a runtime unit test as well (perhaps to core parts of pymatgen as well), now we only test the results, and runtime is not covered at all. Thanks again for bringing up this idea!

@DanielYang59
Copy link
Collaborator Author

DanielYang59 commented Oct 23, 2024

Regarding the actual test methodology, I haven't got much finding about best practice on doing an import time test. Your method certainly works, what about spawning a subprocess (is there any pitfall about this, like subprocess overhead maybe?), something like:

import subprocess
import time


def measure_import_time(import_command: str) -> float:
    start_time = time.time()
    subprocess.run(['python', '-c', 'import pymatviz'], check=True)  # import_command
    return time.time() - start_time

I hope I'm correct here, I guess there's a slight pitfall with the sys.modules method though, because it only pops the outer module. Say if we have monty which imports torch indirectly/internally, if we just pop monty, the actual expensive torch would still remain in cache, so:

  • if name == module_name or name.startswith(f"{module_name}.") doesn't seem to capture indirect imports?
  • If we were to profile several modules at the same time (say pmv.a and pmv.b both import module), the shared internal module would still be cached.
  • it's a bit hard to average across several runs?

Is there any advantage to use relative time (time.perf_counter()) over real-world time (time.time())? The latter seems more comprehensible to me (though at the end of the day we still need to collect the reference time from CI).

@janosh
Copy link
Owner

janosh commented Oct 23, 2024

Your method certainly works, what about spawning a subprocess (is there any pitfall about this, like subprocess overhead maybe?), something like:

i'm not sure but i think subprocess should be fine since afaik it runs in a completely isolated process with separate import caching

Is there any advantage to use relative time (time.perf_counter()) over real-world time (time.time())?

yes, have a look at this video which explains the virtues of time.perf_counter

@DanielYang59
Copy link
Collaborator Author

DanielYang59 commented Oct 23, 2024

Your method certainly works, what about spawning a subprocess (is there any pitfall about this, like subprocess overhead maybe?), something like:

i'm not sure but i think subprocess should be fine since afaik it runs in a completely isolated process with separate import caching

Thanks! Let's use subprocess for now, this should make things much easier (if there is not pitfall)

Is there any advantage to use relative time (time.perf_counter()) over real-world time (time.time())?

yes, have a look at this video which explains the virtues of time.perf_counter

Thanks for sharing, I would have a look later but I certainly trust your judgement.

By the way, how do we regenerate the .pytest-split-durations record (not now, after we finish everything, current import is still very slow and might need more tuning).

@DanielYang59
Copy link
Collaborator Author

DanielYang59 commented Oct 23, 2024

image

Currently like 12% of the import time is used to import import plotly.figure_factory in ptable.plotly where it's only used once across the entire code base, I have lazily imported it ed48b13.

After that:
image

@DanielYang59 DanielYang59 changed the title Speedup import Speedup import and add regression check for import time Oct 23, 2024
@janosh
Copy link
Owner

janosh commented Oct 23, 2024

By the way, how do we regenerate the .pytest-split-durations record (not now, after we finish everything, current import is still very slow and might need more tuning).

the command for that is

pytest --store-durations --durations-path tests/.pytest-split-durations

i should have documented that somewhere. probably best in test.yml where we set test-cmd: ...

@DanielYang59
Copy link
Collaborator Author

DanielYang59 commented Oct 23, 2024

Thanks a ton for letting me know!

What I don't really understand is how do we generate this runtime record via GitHub CI runners (in contrast to run that command locally)?

Especially for pymatgen where:

  • it's huge and I don't have that much resources to run all jobs locally
  • tasks are split (I haven't tried myself, perhaps it would be combined automatically?)

I have a feeling that we may not want this test to run for each commit (because I don't expect we would introduce changes that vary import time very often), it's slowing down everything. Perhaps we only run it only when merging into main?

For this PR, I currently cannot think of more improvement to make as the left is pretty much core packages like scipy, pandas, matplotlib. Do we want to perhaps merge it (after I modify it to run at main branch only)

@DanielYang59 DanielYang59 marked this pull request as ready for review October 23, 2024 08:29
@janosh
Copy link
Owner

janosh commented Nov 2, 2024

very thankful for this PR! 👍 didn't mean to keep it parked so long

What I don't really understand is how do we generate this runtime record via GitHub CI runners (in contrast to run that command locally)?

i think generating locally should be fine. what matters shouldn't be absolute values for test run times but how long a test takes relative to the others which will be similar in CI and locally

it's huge and I don't have that much resources to run all jobs locally

even pymatgen only takes 10 min or so to run the whole test suite locally for me. curious how long for you?

Perhaps we only run it only when merging into main?

is the test slow enough in CI that it would extend overall time to checks complete if it ran inside its own split? i just tried test_import_time locally and got 80.52s. will be more on GitHub but tests already take ~2 min.
either way, i think it's fine to only run this test on main

is this PR ready to go from your side? are we waiting on a monty release to get the new startup benefits?

@janosh
Copy link
Owner

janosh commented Nov 2, 2024

are we waiting on a monty release to get the new startup benefits?

ah, saw your comment in pyproject

    # TODO: pmv doesn't actually depend on monty, however latest monty
    # includes a critical import patch, remove this after pmg bump dep
    "monty>=2024.10.21",

@janosh janosh merged commit 691a632 into janosh:main Nov 2, 2024
25 checks passed
@DanielYang59 DanielYang59 deleted the speedup-import branch November 3, 2024 03:35
@DanielYang59
Copy link
Collaborator Author

i think generating locally should be fine.

even pymatgen only takes 10 min or so to run the whole test suite locally for me. curious how long for you?

Yep should takes about the same time just like pymatgen CI (macOS runner is running on M1 chip, each split takes around 1 min), as I'm usually coding on a M3 MacBook Air laptop. I'm just being curious as I think running on a dedicated runner would give better scalability and consistency :)

is the test slow enough in CI that it would extend overall time to checks complete if it ran inside its own split?

The original implementation was running 10 repeats, but I reduced repeat to 3 so should be good now.

for module_name in REF_IMPORT_TIME
}

# Print out the import times in a copyable format
print("\nCopyable import time dictionary:")
print("{")
for module_name, import_time in import_times.items():
print(f' "{module_name}": {import_time:.2f},')
Copy link
Collaborator Author

@DanielYang59 DanielYang59 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was intended to generate a directly copyable dict (and human-readable for manual inspection) for us to directly copy and paste to update the time recording, printing a dict directly gives a one-liner:

Copyable import time dictionary:
{'pymatviz': 2084.25, 'pymatviz.coordination': 2342.41, 'pymatviz.cumulative': 2299.73, 'pymatviz.histogram': 2443.11, 'pymatviz.phonons': 2235.57, 'pymatviz.powerups': 2172.71, 'pymatviz.ptable': 2286.77, 'pymatviz.rainclouds': 2702.03, 'pymatviz.rdf': 2331.98, 'pymatviz.relevance': 2256.29, 'pymatviz.sankey': 2313.12, 'pymatviz.scatter': 2312.48, 'pymatviz.structure_viz': 2330.39, 'pymatviz.sunburst': 2395.04, 'pymatviz.uncertainty': 2317.87, 'pymatviz.xrd': 2242.09}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i saw that. the thing is, if you paste that printed dict into a python file, ruff will auto-format it to multiple lines so having easier to read code seemed more important

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

@@ -69,12 +66,12 @@ def measure_import_time_in_ms(module_name: str, count: int = 3) -> float:
"""
total_time = 0.0

for _ in range(count):
start_time = time.perf_counter_ns()
Copy link
Collaborator Author

@DanielYang59 DanielYang59 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. using perf_counter_ns was intended to avoid float precision lost (the former gives time in int, while the latter gives in float), but admittedly for our case the error should be negligible as: 1. it's far from the ns level; 2. we're rounding it anyway

Use perf_counter_ns() to avoid the precision loss caused by the float type.

janosh added a commit that referenced this pull request Mar 28, 2025
* Add test framework to monitor module import times with regression tests
* Use time.perf_counter for accurate timing
* Implement lazy imports across multiple modules to improve performance:
  - scipy
  - plotly.figure_factory
  - sklearn
  - pymatgen (Structure, NearNeighbors, PhononDos, PhononBands, Composition)
* Add reference import times for all core modules
* Configure tests to run only on main branch
* Add grace and hard thresholds for import time regression

---------

Co-authored-by: Janosh Riebesell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg Package ux User experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pymatviz import cost way too high
2 participants