Skip to content

feat: add memoization when converting tree to graph #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 8, 2020

Conversation

admons
Copy link
Contributor

@admons admons commented Dec 3, 2020

  • Ready for review
  • Follows CONTRIBUTING rules
  • Reviewed by Snyk internal team

What does this PR do?

Like #73 (which affects graphToDepTree), It adds memoization in the depTreeToGraph

For a very large tree (that was generated using the memoization from #73), it reduced the time from 20000ms to less than 200ms

@admons admons requested a review from darscan December 3, 2020 20:19
@admons admons requested a review from a team as a code owner December 3, 2020 20:19
@admons admons force-pushed the feat/add-memoization-to-depTreeToGraph branch from 36a2848 to d842c25 Compare December 3, 2020 20:21
@FauxFaux
Copy link
Contributor

FauxFaux commented Dec 4, 2020

Like with the other PR, this appears to just leak memory forever, unless we're planning to use it perfectly in all cases. Isn't there some kind of lru-cache for this?

@robcresswell
Copy link
Contributor

https://github.com/dominictarr/bench-lru if you do go that route

@darscan
Copy link
Contributor

darscan commented Dec 4, 2020

@FauxFaux I don't think either of the PRs leak memory. In both cases GC will clean up the maps themselves at some point after the operation is complete. Also, the values/keys in the maps are already in memory and shared by the output instances. What are you seeing that I'm not?

Also, I don'r see how an lru would help.. there is no global state in either of the PRs

@FauxFaux
Copy link
Contributor

FauxFaux commented Dec 8, 2020

I WAS WRONG IGNORE ME.

@darscan
Copy link
Contributor

darscan commented Dec 8, 2020

For posterity: the concern was around how default argument initialisers are evaluated in JS (versus Python, for example, where this would indeed be problematic)

@admons admons merged commit bb9169f into master Dec 8, 2020
@admons admons deleted the feat/add-memoization-to-depTreeToGraph branch December 8, 2020 17:47
@snyksec
Copy link

snyksec commented Dec 8, 2020

🎉 This PR is included in version 1.22.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@Shesekino
Copy link

For posterity: the concern was around how default argument initialisers are evaluated in JS (versus Python, for example, where this would indeed be problematic)

because in Python arguments with a mutable default value get attached to the function as a "private member" and never get deleted? so the concern was memoizationMap never getting cleaned because nothing actively cleans it?

@calvinli
Copy link

Hey all,

Sorry to revive an old-ish thread but I tried this PR and it did not improve the performance of buildGraph(...) on a very large JS dependency tree. I found that it was because

if (memoizationMap.has(depTree)) {

never evaluated to true because Map.has uses Object.is semantics (same object, instead of identical object). I was only able to get a speedup (which was substantial) by changing the condition to something like

    const depTreeObjHash = objectHash(depTree, { excludeKeys: (key => (key === 'dependencies')) }) + objectHash(depTree.dependencies || {}, { excludeValues: true })
    if (memoizationMap.has(depTreeObjHash)) {
        return memoizationMap.get(depTreeObjHash);
    }

so I'm wondering whether the behavior in this PR was intentional.

@darscan
Copy link
Contributor

darscan commented Feb 3, 2021

Hi @calvinli . Apologies for the delay.

never evaluated to true because Map.has uses Object.is semantics

Yep, unfortunately that was intentional - this was just a small optimisation following on from #73 so it only helps in that particular case.

Some tools (like npm) can produce trees where sub trees that appear equivalent from the outside are not, so we have to use full object hashing to pick them up and preserve them properly when converting them to graphs. We can't just look at the direct deps of each dep as your example does (by using excludeValues). For example:

Your change would consider both instances of [email protected] as equivalent, but they are not.

Having to compare sub trees like this makes the conversion very expensive indeed. The goal is to build dep-graphs directly (instead of trees) so that we can avoid this issue entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants