Skip to content

Improve pnp loader speed and memory: jszip implementation #6688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 7, 2025

Conversation

goloveychuk
Copy link
Contributor

@goloveychuk goloveychuk commented Feb 14, 2025

What's the problem this PR addresses?

rework of #6671
TODO:

  • check security?
  • tests
  • elaborate mixed compression logic
  • benchmarks?
  • hide exports and interfaces for external consumers?

...

How did you fix it?

...

Checklist

  • I have set the packages that need to be released for my changes to be effective.
  • I will check that all automated PR checks pass before the PR gets reviewed.

@goloveychuk goloveychuk force-pushed the minizip2 branch 2 times, most recently from 8c5a405 to 5a847fd Compare February 23, 2025 14:03
@goloveychuk goloveychuk changed the title jsZip implementation Improve pnp loader speed and memory: jszip implementation Feb 23, 2025
@goloveychuk
Copy link
Contributor Author

goloveychuk commented Feb 23, 2025

@arcanis review pls.
Todo:

  1. have question about one test libzip/incons-file-count-overflow.zip
  2. need benchmark?
  3. mixed compression logic should be checked

@goloveychuk
Copy link
Contributor Author

@merceyz @RDIL maybe you want to review?

@goloveychuk
Copy link
Contributor Author

@arcanis if you want I can move jszip implementation into my repo, and add as external dependency from berry.

Comment on lines +96 to +100
minizip: {
description: `Whether Yarn should use minizip to extract archives`,
type: SettingsType.BOOLEAN,
default: false,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the main benefit of this PR is to decrease the runtime footprint (by comparison the install doesn't have a lot to gain, since only unplugged packages are extracted on the disk and there's very little of them), I'd rather keep the existing codepath for unplug (to simplify the changes as much as possible).

@@ -115,6 +123,15 @@ const plugin: Plugin<CoreHooks & StageHooks> = {
default: [],
isArray: true,
},
experimentalZipImplementation: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
experimentalZipImplementation: {
pnpZipImplementation: {

entries.push({
name,
os,
mtime: 0, //we dont care,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 456789000 (see SAFE_TIME)

}

export class ZipFS extends BasePortableFakeFS {
private readonly libzip: Libzip;
export interface ZipImpl {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you move it to a ZipImplementation.ts file?

@arcanis
Copy link
Member

arcanis commented Mar 1, 2025

Hey @goloveychuk! No it's fine, I prefer to have it inside this repository at least as long as it's experimental; I still have to review this PR, it's pretty large (even discounting the test artifacts) and I've been extremely busy at work since the beginning of the year. I'll try to finish the review by next week.

@goloveychuk
Copy link
Contributor Author

Thank you.
Tests could be rewritten from snapshot file to inline snapshots or code which defines expected files list.

@goloveychuk
Copy link
Contributor Author

hey, @arcanis gentle reminder. Thanks!

@arcanis
Copy link
Member

arcanis commented Mar 28, 2025

I think the implementation looks good. I didn't see any perf difference when running the basic tests, but they are very very simple so it's not surprising. A better way to benchmark would be to use our yarn bench command (see scripts/bench-folder.ts), but it doesn't support using a different benchmark command between tests at the moment.

@goloveychuk
Copy link
Contributor Author

@arcanis I also suggest to rethink caching
https://github.com/goloveychuk/berry/blob/fb552204a46f364d75d08117e88f044f063c3703/packages/yarnpkg-libzip/sources/ZipFS.ts#L860
In many cases it leads to high memory consumption without benefits.
Usually js files are cached by infra:

  • commonjs is caching scripts in require.cache
  • esm loaders are caching compiled and linked module
  • webpack jest are caching transformed modules

It's a rear case when some infra will read same file in one pnp runtime.
And benefits are not clear.
Especially with js zip impl, which reads file from filesystem without additional operations like loading full archive into memory. Which should be same speed as node_modules read from fs.
Exception for above is compressed archives. But again, I'm not sure any infra will read same file twice.

Safest solution is to hide caching under specific Impl class, to let them decide what is best strategy for caching.
Wdyt?

@goloveychuk
Copy link
Contributor Author

I can do it in a new PR

@arcanis
Copy link
Member

arcanis commented Mar 28, 2025

It's a rear case when some infra will read same file in one pnp runtime.
And benefits are not clear.

The libzip doesn't allow (or didn't allow, perhaps they fixed it?) to read files that just got written (ie writeFileSync(p); readFileSync(p); was crashing). The file caching was to workaround this issue. It's fine to get rid of it in the js backend 👍

@goloveychuk
Copy link
Contributor Author

@arcanis made changes in last commit

@goloveychuk
Copy link
Contributor Author

Anything to fix to merge pr? Thanks.

@arcanis arcanis merged commit 242234c into yarnpkg:master Apr 7, 2025
26 checks passed
@goloveychuk
Copy link
Contributor Author

Thanks for merge!
Could you please create release?

@adrian-gierakowski
Copy link

adrian-gierakowski commented Apr 18, 2025

is this the default now, or do I need to somehow opt-in? (release notes say "Experimental JS-based zip parser")

@goloveychuk
Copy link
Contributor Author

'pnpZipBackend: js'
In yarnrc. Use latest yarn version, it has fds leak fix.

@neoncube2
Copy link

Since this is experimental, I figured I'd let you know that I'm currently using pnpZipBackend: js on my project, and it seems to work fine :)

@goloveychuk
Copy link
Contributor Author

Speed, memory improvements?

@arcanis
Copy link
Member

arcanis commented Apr 28, 2025

I personally didn't see any change in our large project at work when running TypeScript - neither in perfs nor in memory 😔

@adrian-gierakowski
Copy link

No noticeable memory usage reduction here either

@neoncube2
Copy link

I profiled my project on Windows, using procmon, when running the command yarn run webpack serve --mode=development.

  • Packages: 797
  • Size of .yarn/cache/: 35MB
  • Max memory usage with default zip backend, after discarding outliers: ~580KB
  • Max memory usage with JS zip backend, after discarding outliers: ~560KB

It does look like the JS zip backend might be saving a bit of memory, especially since I'm guessing that most of the 560KB/580KB is taken up by webpack, etc.

Interestingly, the JS zip backend also seems to be more consistent, always staying around 560KB, while the default zip backend occasionally spikes all the way up to ~750KB (I observed this several times while profiling, but didn't include these outliers in the results).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants