Skip to content

Investigate if we can safely remove the use of Uint8clampedArray when it's useless #14849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
calixteman opened this issue Apr 27, 2022 · 9 comments · Fixed by #14856
Closed
Assignees
Labels

Comments

@calixteman
Copy link
Contributor

For example here:

return forceClamped && !(subarray instanceof Uint8ClampedArray)

we use the ctor with a typedArray argument:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8ClampedArray/Uint8ClampedArray#typedarray
and ergo data are copied.
And as far as I understand, Uint8ClampedArray is especially useful when some data are written in the array but in general (afaik) there are no difference with an Uint8Array.

Those Uint8ClampedArray have been added in #9802.

and it clearly makes sense for color stuff, for image.js::undoPreblend and maybe few other cases I overlooked but in general my feeling is that they're useless and their use induces a performance penalty.

@Snuffleupagus, any opinion about that ?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 27, 2022

Those Uint8ClampedArray have been added in #9802.

and it clearly makes sense for color stuff, for image.js::undoPreblend and maybe few other cases I overlooked but in general my feeling is that they're useless and their use induces a performance penalty.

I suppose that it's possible that we currently use the forceClamp-parameter unnecessarily in some cases, although it should be limited to data where e.g. the ColorSpace-code will subsequently be used, but it's not really clear to me more exactly which those actually are.
Do you perhaps have a list of potential ones that you think could/should be removed? Given that all code-paths that need to use Uint8ClampedArray should have asserts, have you tried running all tests locally with your proposed changes?

@calixteman
Copy link
Contributor Author

For example it seems to be useless here:

static createMask({

and even here (which is almost the previous version of the code above):
static createRawMask({

or here:

pdf.js/src/core/image.js

Lines 714 to 738 in 694cb31

imgArray = this.getImageBytes(originalHeight * rowBytes);
// If imgArray came from a DecodeStream, we're safe to transfer it
// (and thus detach its underlying buffer) because it will constitute
// the entire DecodeStream's data. But if it came from a Stream, we
// need to copy it because it'll only be a portion of the Stream's
// data, and the rest will be read later on.
if (this.image instanceof DecodeStream) {
imgData.data = imgArray;
} else {
const newArray = new Uint8ClampedArray(imgArray.length);
newArray.set(imgArray);
imgData.data = newArray;
}
if (this.needsDecode) {
// Invert the buffer (which must be grayscale if we reached here).
assert(
kind === ImageKind.GRAYSCALE_1BPP,
"PDFImage.createImageData: The image must be grayscale."
);
const buffer = imgData.data;
for (let i = 0, ii = buffer.length; i < ii; i++) {
buffer[i] ^= 0xff;
}
}
return imgData;

or:

pdf.js/src/core/image.js

Lines 742 to 757 in 694cb31

switch (this.colorSpace.name) {
case "DeviceGray":
// Avoid truncating the image, since `JpegImage.getData`
// will expand the image data when `forceRGB === true`.
imageLength *= 3;
/* falls through */
case "DeviceRGB":
case "DeviceCMYK":
imgData.kind = ImageKind.RGB_24BPP;
imgData.data = this.getImageBytes(
imageLength,
drawWidth,
drawHeight,
/* forceRGB = */ true
);
return imgData;

Maybe we could remove the forceClamped parameter and return a wrapper around a typedArray.
And add a method to get a clamped version of the underlying buffer or a clone if we mustn't change the underlying data, something similar to what we have here:

pdf.js/src/core/image.js

Lines 715 to 726 in 694cb31

// If imgArray came from a DecodeStream, we're safe to transfer it
// (and thus detach its underlying buffer) because it will constitute
// the entire DecodeStream's data. But if it came from a Stream, we
// need to copy it because it'll only be a portion of the Stream's
// data, and the rest will be read later on.
if (this.image instanceof DecodeStream) {
imgData.data = imgArray;
} else {
const newArray = new Uint8ClampedArray(imgArray.length);
newArray.set(imgArray);
imgData.data = newArray;
}

Just a sort of Copy-on-Write wrapper... something like that.
And the clamped version is only useful when some data are potentially out of the 0-255 range, but in case we do stuff like buffer[i] ^= 0xFF it isn't required.

The problem I see with the forceClamped = true parameter is that we use it because maybe at some point we'll do some color stuff. With the CoW approach, we're more lazy and just do the clone only when we absolutely need it.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 27, 2022

I'd suggest that we simply remove the forceClamped-parameter only where it's actually deemed unnecessary, based on code-inspection and testing, and refrain from bigger changes here.

Maybe we could remove the forceClamped parameter and return a wrapper around a typedArray.

What you're suggesting, assuming I'm understanding you correctly, is changing the return type of BaseStream.getBytes and related methods. Doing that would effect basically the entire worker-thread code, in one way or another, since in most cases we're actually fine with whatever kind of TypedArray.
Having those methods return a wrapper, rather than an actual TypedArray, would thus mean re-writing so much code and I'd also wonder if the wrapper itself wouldn't also incur additional memory/run-time overhead?

Edit: While you're technically correct about this case, note that it doesn't actually matter in practice since the JPEG decoder only returns Uint8ClampedArray-data and the forceClamped-parameter thus cannot lead to any additional data-copying :-)

@calixteman
Copy link
Contributor Author

I'm not crazy ;)
I was thinking changing the return value of getImageBytes and for stuff like:

const imgArray = image.getBytes(
bitStrideLength * h,
/* forceClamped = */ true
);

My feeling is that having a CoW wrapper would help for performance but for code readability too.
The problem of passing the correct value to forceClamped is that it could lead to unclear code, kind of forceClamped = a && (!b || c) && d..., but maybe it's just me and a matter of taste.

To give you some context, in order to move some image stuff on the worker instead of the main thread in using some OffscreenCanvas, I was wondering for example if could directly create a Uint32Array(n) for 24bpp images instead of having a a temporary Uint8Array(3 * n) in order to remove an extra copy. And in reading the code I saw the Uint8ClampedArray and then I was wondering why we need the clamped version when we don't write in it.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 27, 2022

I was thinking changing the return value of getImageBytes and for stuff like:

Oh, that's the missing piece of context required to understand what you meant here :-)

Basically you want to limit the previous forceClamped handling to only the PDFImage-code, which means that it can be removed from the various Stream-implementations; in that case that should probably be fine!
(Hopefully our existing asserts, combined with the ref-tests, should ensure that we don't accidentally regress something.)

@calixteman
Copy link
Contributor Author

@Snuffleupagus, if you agree that it'd make things better, would mind to fix this bug ?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 28, 2022

if you agree that it'd make things better, would mind to fix this bug ?

Maybe, but I'm not sure exactly what you have in mind for the getImageBytes-method. Do you want it to look e.g. something like the following?

diff --git a/src/core/image.js b/src/core/image.js
index af0bf6f4a..6985024c1 100644
--- a/src/core/image.js
+++ b/src/core/image.js
@@ -872,7 +872,16 @@ class PDFImage {
     this.image.drawWidth = drawWidth || this.width;
     this.image.drawHeight = drawHeight || this.height;
     this.image.forceRGB = !!forceRGB;
-    return this.image.getBytes(length, /* forceClamped = */ true);
+
+    const imageBytes = this.image.getBytes(length);
+    return {
+      raw: imageBytes,
+      clamped: () => {
+        return imageBytes instanceof Uint8ClampedArray
+          ? imageBytes
+          : new Uint8ClampedArray(imageBytes);
+      },
+    };
   }
 }

@calixteman
Copy link
Contributor Author

Yep and if possible the return object could handle the if (this.image instanceof DecodeStream) ... thing.
And if it safe to write into the original buffer then use new Uint8ClampedArray(imageBytes.buffer) instead to avoid an extra copy.
The idea is really to try to have something smart enough to avoid useless copies and to make the code more simple.
I can do it for sure, but you're certainly the best to avoid to miss something important somewhere and handle some corner cases you've in mind if any.

@Snuffleupagus
Copy link
Collaborator

After taking a brief look, and running all tests with a quickly thrown together patch, I'm not actually sure that we need any of the forceClamped handling in the PDFImage-code at all.
I might be overlooking something, or maybe our test-coverage is lacking, because it's really starting to look like this is basically useless code...

In the following cases we're returning the data without any ColorSpace-parsing, and the exact TypedArray used shouldn't matter:

In the following cases the image-data is only used internally, and again the exact TypedArray used shouldn't matter:

  • imgArray = this.getImageBytes(originalHeight * rowBytes);
    however the actual image-data is being defined below in

    pdf.js/src/core/image.js

    Lines 772 to 779 in b72a448

    if (!forceRGBA && !this.smask && !this.mask) {
    imgData.kind = ImageKind.RGB_24BPP;
    imgData.data = new Uint8ClampedArray(drawWidth * drawHeight * 3);
    alpha01 = 0;
    maybeUndoPreblend = false;
    } else {
    imgData.kind = ImageKind.RGBA_32BPP;
    imgData.data = new Uint8ClampedArray(drawWidth * drawHeight * 4);
  • const imgArray = this.getImageBytes(height * rowBytes);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants