Skip to content

Lossless WebP encoder allocates A LOT #2934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
SladeThe opened this issue Jun 4, 2025 · 4 comments · May be fixed by #2940
Open
4 tasks done

Lossless WebP encoder allocates A LOT #2934

SladeThe opened this issue Jun 4, 2025 · 4 comments · May be fixed by #2940

Comments

@SladeThe
Copy link

SladeThe commented Jun 4, 2025

Prerequisites

  • I have written a descriptive issue title
  • I have verified that I am running the latest version of ImageSharp
  • I have verified if the problem exist in both DEBUG and RELEASE mode
  • I have searched open and closed issues to ensure it has not already been reported

ImageSharp version

3.1.8

Environment (Operating system, version and so on)

Windows 11 with latest updates

.NET Framework version

.NET 9

Description

Lossless WebP encoder allocates both a lot of small object and also a few big chunks, which are NOT reused.
It is about 500 K objects and 200-300 MB total memory to encode single UWQHD (3440x1440) image.
It leads to very high pressure on GC. Especially, if there are many images to process.

My encoding settings are probably the lowest for lossless saving:

WebpEncoder {
    FileFormat = WebpFileFormatType.Lossless,
    Quality = 0,
    TransparentColorMode = WebpTransparentColorMode.Preserve,
}

I found the following hot paths (for one image):
Image
Image

PixOrCopy.CreateLiteral produces TONS of small objects.
Vp8LEncoder's constructor may not look that scary, but let's sequentially encode 10 images:
Image
Image

As you can see, those big chunks are not reused in the future.
I tried to provide a custom memory allocator. It didn't help.
I suppose the allocator is totally ignored there.

Fixing these two moments should greatly accelerate the WebP encoder:

  1. PixOrCopy.CreateLiteral problem is probably the hardest one as it requires changing of data structures.
  2. Changing Vp8LEncoder should be much easier.

Steps to Reproduce

Just encode an image with performance profiler and object allocation tracking.

@antonfirsov
Copy link
Member

It is about 500 K objects and 200-300 MB total memory

Sounds like PixOrCopy should be a struct. Moreover, if the maximum capacity of Vp8LBackwardRefs.Refs is known before building a collection, Vp8LBackwardRefs could work over a MemoryAllocator-allocated buffer instead of a List<PixOrCopy>.

@SladeThe
Copy link
Author

SladeThe commented Jun 6, 2025

I've made the changes locally and run benchmarks. Here are the results.

Current version

Method TestImage Mean Error StdDev Ratio Gen0 Gen1 Gen2 Allocated Alloc Ratio
'Magick Webp Lossy' Png/Bike.png 12.81 ms 1.290 ms 0.071 ms 0.09 - - - 66.48 KB 0.13
'ImageSharp Webp Lossy' Png/Bike.png 42.26 ms 2.952 ms 0.162 ms 0.28 923.0769 384.6154 - 15186.98 KB 29.37
'Magick Webp Lossless' Png/Bike.png 150.37 ms 19.626 ms 1.076 ms 1.00 - - - 517.12 KB 1.00
'ImageSharp Webp Lossless' Png/Bike.png 105.31 ms 24.582 ms 1.347 ms 0.70 2000.0000 1800.0000 1400.0000 20490.55 KB 39.62

PixOrCopy -> struct

Method TestImage Mean Error StdDev Ratio Gen0 Gen1 Gen2 Allocated Alloc Ratio
'ImageSharp Webp Lossless' Png/Bike.png 97.95 ms 9.195 ms 0.504 ms 0.65 2166.6667 2166.6667 2166.6667 10091.99 KB 19.52

PixOrCopy -> struct, Vp8LEncoder + allocator

Method TestImage Mean Error StdDev Ratio Gen0 Gen1 Gen2 Allocated Alloc Ratio
'ImageSharp Webp Lossless' Png/Bike.png 103.36 ms 15.237 ms 0.835 ms 0.68 1800.0000 1800.0000 1800.0000 3904.51 KB 7.55

The first change reduced the allocated memory by half and also noticeably increased encoder performance.

The second change reduced the allocated memory even more, but shifted performance a little back.
It is still faster by 2 ms than the latest release and the result is reproducible.
Maybe, it'll shine brighter, when there are other concurrent tasks running, due to lower pressure on GC, hard to say.

@antonfirsov Any thoughts?

@antonfirsov
Copy link
Member

antonfirsov commented Jun 6, 2025

I didn't expect significant speed difference from the changes, part of the differences we see could be just noise.

The allocation improvements make the change absolutely worthy. Are you interested opening a PR?

@SladeThe
Copy link
Author

SladeThe commented Jun 6, 2025

Yep. I'll open a PR.
I just wasn't sure about switching to memory allocator in that particular case.
Making PixOrCopy a struct is clearly better in every aspect than the current implementation.
The second change is controversial in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants