Skip to content
This repository was archived by the owner on Aug 17, 2022. It is now read-only.

Performance concerns about UTF-8 strings #38

Open
Pauan opened this issue Jun 17, 2019 · 16 comments
Open

Performance concerns about UTF-8 strings #38

Pauan opened this issue Jun 17, 2019 · 16 comments

Comments

@Pauan
Copy link

Pauan commented Jun 17, 2019

I've written some Rust web apps which are compiled to WebAssembly and run in the browser. I'm using wasm-bindgen for this.

Generally the code runs really fast (since both Rust and WebAssembly are quite fast), but there's one area in particular which is an order of magnitude slower: strings.

Rust uses UTF-8 strings, and so whenever I send a string to/from JS it has to encode/decode it to UTF-16. This is done using the browser's TextEncoder.encodeInto and TextDecoder.decode APIs.

This is as optimal as we can currently get in terms of speed, but it's not good enough, because encoding/decoding is way slower than everything else:

DevTools Screenshot

The total script execution time is 2413ms. Out of that, 494ms is from the browser's decoding, and a further 434ms from garbage collecting the JS strings. So that means just passing strings from Rust to JS is taking up ~40% of the total execution time!

This encoding/decoding is so slow that it means that a pure JavaScript app is faster than a Rust app (the JS app only takes 1236ms for all scripting + gc)!

These are not big strings, they are all small strings like "row", "col-sm-6", "div", etc. The biggest string is "glyphicon-remove".

Unfortunately, I cannot avoid this string passing, because I'm calling native web APIs (like document.createElement, Node.prototype.textContent, etc.), so the usual solution of "move everything into Rust" doesn't work.

This is clearly a known concern for WebIDL bindings, which is why the proposal includes the utf8‑str and utf8‑cstr types.

However, I'm concerned that WebIDL won't actually fix this performance problem, because the browsers (at least Firefox) internally use UTF-16, so they'll still have to do the encoding/decoding, so the performance will be the same.

I know this is an implementation concern and not a spec concern, but is there any practical plan for how the browsers can implement the utf8-str and utf8-cstr types in a fast zero-copy O(1) way without needing to change their engines to internally use UTF-8?

@fgmccabe
Copy link
Contributor

fgmccabe commented Jun 17, 2019 via email

@annevk
Copy link
Member

annevk commented Jun 18, 2019

I know we are looking at using UTF-8 more internally and also want to avoid more string conversion such as those you mention. I'm not sure how that's currently prioritized relative to other work.

cc @hsivonen @lukewagner

@fgmccabe
Copy link
Contributor

fgmccabe commented Jun 18, 2019 via email

@Pauan
Copy link
Author

Pauan commented Jun 19, 2019

(P.S. It seems in the recent slides that utf8-str was renamed to utf8-mem-str, but of course all the same problems still apply)

@kmiller68
Copy link

For what it's worth, in the WebKit world, many of the strings passed into WebIDL will end up being "Atomed" (canonicalized so comparison is just a pointer check). Thus, even if we had a utf8 string implementation, there would still be a copy problem.

As far as implementing utf8 support in WebKit, much like @annevk said, there's definitely interest in having a utf8 implementation. Unfortunately, my guess is implementing it would be a many year process. IIUC, implementing Latin1 strings was a two year process. Since that was ~8 years ago, I would imagine a utf8 project would not be easier.

@Pauan
Copy link
Author

Pauan commented Jun 19, 2019

@kmiller68 Yeah, I don't expect a solution anytime soon. I just wanted to make sure that this problem is known about and is being worked on (even if slowly).

I was hoping there would be a clever way to avoid needing to re-implement support for UTF-8, but I guess not.

@Pauan
Copy link
Author

Pauan commented Jun 21, 2019

I had the idea to try caching the strings, so if the same string is used multiple times it only needs to encode it once.

I implemented a simple cache in Rust, which only caches a small number of strings which are known to never change. The results are dramatic:

DevTools 005

Now the scripting time is 1680ms (733ms faster), and decoding + GC now only takes 429ms (499ms faster). That means encoding now takes 46% the amount of time it used to!

I think this technique could be used in the browser engine: keep an LRU cache mapping from UTF-8 strings to UTF-16 strings.

When converting from the utf8-mem-str type, it would look up the string in the cache, and if it's found it can avoid the encoding cost.

This does add a bit of performance unpredictability, and the caches will need to be carefully tuned for maximum performance, but it seems like a relatively easy way to get huge performance gains without needing to refactor the engine to use UTF-8.

@Horcrux7
Copy link

Why not convert this strings to UTF-16 at compile time? On creating the wasm file.

@Pauan
Copy link
Author

Pauan commented Jun 21, 2019

@Horcrux7 When the Rust program wants to call a native API, it must first send the UTF-8 string from Rust to JS, and then convert that UTF-8 string into a JS string (which is UTF-16).

This encoding from UTF-8 to UTF-16 cannot happen in Rust, because the end result needs to be a JS string, not a Rust string.

Wasm cannot (currently) store or create JS strings, and so the JS string must be created on the JS side. So any sort of compile-time encoding would have to happen on the JS side, not the wasm side.

@lukewagner
Copy link
Member

It's also the case in Gecko that we'll have to copy strings in Gecko and it would take a lot of work to achieve a truly zero-copy optimization. However, this would still be superior to the current situation (without bindings) in that the copy memory could be eagerly release (no GC), allocated efficiently (e.g., from a stack), and without a separate wasm->(JS->)API call to perform the decode step. So there still be a significant perf win even without the zero-copy impl.

FWIW, everything in the explainer and recent presentation slides should be considered just a strawman to help give a general sense of how things would work; the precise set of binding operators and their semantics isn't yet fleshed out.

@hsivonen
Copy link

@Pauan

These are not big strings, they are all small strings like "row", "col-sm-6", "div", etc. The biggest string is "glyphicon-remove".

Are they all ASCII? Do they flow equally in both directions (Wasm to JS and JS to Wasm)? Do I understand correctly that in the case JS to Wasm case, these are literals in the JS source and not strings that you've concatenated from pieces at run time?

For small ASCII strings like this, the JS to Wasm path in wasm-bindgen currently bypasses TextEncoder.encodeInto. Even though TextEncoder.encodeInto was optimized for Firefox 71 to avoid copies and to use SIMD, the binding overhead appears to be so large that it continues to make sense for wasm-bindgen to bypass TextEncoder.encodeInto for ASCII at least when the string are short.

I think the main take-away from the TextEncoder.encodeInto optimization not outperforming wasm-bindgen's manual JS-only WebIDL bypass this is that attributing the performance concern to UTF-8 may be incorrect and the issue is more likely the cost of the WebIDL layer for TextEncoder.encodeInto and TextDecoder.decode. Since this issue is filed on interface-types, the cost of WebIDL of the TextEncoder.encodeInto and TextDecoder.decode shouldn't be held against interface-types.

(In the Wasm to JS direction, TextDecoder.decode is used by wasm-bindgen without a bypass, so the binding overhead is there even for short ASCII.)

@Pauan
Copy link
Author

Pauan commented Oct 21, 2019

@hsivonen Are they all ASCII?

Yes.

Do they flow equally in both directions (Wasm to JS and JS to Wasm)?

No, they only flow from Wasm to JS (which means UTF-8 -> UTF-16 encoding using TextDecoder.decode).

Therefore the ASCII optimizations and encodeInto do not factor into it.

Also, as shown in the screenshots I've posted, the primary performance bottleneck is from the browser's internal encoding function (and GCing the strings), not any of the wasm-bindgen glue (which is negligible).

So there's only two ways to fix this problem: 1) the browsers somehow dramatically improve their internal encoding algorithms, or 2) they avoid encoding entirely (which is what this issue is about).

At the time this issue was posted, interface types specified the encoding of the string, so this issue was filed in the correct place.

@hsivonen
Copy link

hsivonen commented Oct 23, 2019

Also, as shown in the screenshots I've posted, the primary performance bottleneck is from the browser's internal encoding function (and GCing the strings), not any of the wasm-bindgen glue (which is negligible).

The screenshot appears to show Chrome. What's the performance like in Firefox Nightly?

So there's only two ways to fix this problem: 1) the browsers somehow dramatically improve their internal encoding algorithms, or 2) they avoid encoding entirely (which is what this issue is about).

Is a build from https://treeherder.mozilla.org/#/jobs?repo=try&revision=81247f866a936a332cba7d5b764537c36a7d8494 better or worse than Firefox Nightly?

(On the right, find the "shippable" row for your platform, click the green "+3" to expand it (no need to expand in the Mac case), click the green B to select the build task, at the bottom, click the "Job details" tab and find target.dmg for Mac, target.zip for Windows, or target.tar.bz2 for Linux--all x86_64. You can browse the changesets on the left to convince yourself that the difference between these builds and the Nightly baseline is legitimate.)

For the cases you describe (Wasm to JS, ASCII, longest string length 16), the Treeherder builds linked above should avoid actual encoding conversion and ever buffer malloc in TextDecoder.decode. Your ASCII bytes should get copied as ASCII inline bytes into SpiderMonkey string objects themselves (as opposed to getting inflated to UTF-16 or getting placed into a buffer the the string object points to). However, 1) there's still an inflating conversion to UTF-16 once you pass them to a DOM API (do you do anything other with the strings obtained from TextDecoder.decode than pass them directly to a DOM API?) and 2) you still pay the binding overhead for the cali into TextDecoder.decode.


Your initial comment mentions Node.prototype.textContent but the example strings you give don't look like stuff one would put in text nodes. Are you actually using Node.prototype.textContent with ASCII strings the longest of which has the length 16? (The reason why I'm asking is that DOM text nodes in Gecko have potentially non-UTF-16 storage, but their interface doesn't yet let SpiderMonkey make use of the non-UTF-16 case.)

@Pauan
Copy link
Author

Pauan commented Oct 28, 2019

The screenshot appears to show Chrome. What's the performance like in Firefox Nightly?

Yes the screenshots were in Chrome (though Firefox had similar issues at the time).

For consistency, here's the latest Chrome (Version 77.0.3865.120 (Official Build) (64-bit)) screenshot:

UTF8 Chrome Uncached

And Firefox Developer Edition (71.0b2 (64-bit)):

UTF8 Firefox Uncached

And Firefox Nightly (72.0a1 (2019-10-27) (64-bit)):

UTF8 Nightly Uncached

And the Treeherder build you linked to:

UTF8 Treeherder Uncached

The first thing that I see is that Firefox Developer Edition includes a lot more information: Nightly and Treeherder don't show performance of individual functions like TextDecoder.decode anymore. I really don't like that.

Secondly, Firefox (in general) has much better decode performance than Chrome. It's no longer the massive bottleneck that it was. That wasn't true at the time I filed this issue, so it seems TextDecoder.decode has improved a lot since then, good job!

Thirdly, Nightly and Treeherder don't show the performance of individual functions, however the GC is clearly better in Nightly.

there's still an inflating conversion to UTF-16 once you pass them to a DOM API (do you do anything other with the strings obtained from TextDecoder.decode than pass them directly to a DOM API?)

No, it doesn't do anything with the strings, it just calls TextDecoder.decode and then immediately passes it to the DOM APIs.

Your initial comment mentions Node.prototype.textContent but the example strings you give don't look like stuff one would put in text nodes.

Those specific example strings are added to the DOM's classList with foo.classList.add.

Are you actually using Node.prototype.textContent with ASCII strings the longest of which has the length 16?

Yes, they're ASCII, though in the case of textContent the biggest string is 28 characters long.

@hsivonen
Copy link

@Pauan
Thank you.

The first thing that I see is that Firefox Developer Edition includes a lot more information: Nightly and Treeherder don't show performance of individual functions like TextDecoder.decode anymore. I really don't like that.

I don't know if this is a change in the profiler or if Nightly and Treeherder builds just don't get their symbols distributed in the same way as the release and Dev Edition channels. (I suspect the latter.)

Secondly, Firefox (in general) has much better decode performance than Chrome. It's no longer the massive bottleneck that it was. That wasn't true at the time I filed this issue, so it seems TextDecoder.decode has improved a lot since then, good job!

I don't recall TextDecode.decode internals having changed since June. The WebIDL layer might have. However, it's good to hear that it's faster than Chrome. That's intentional. :-)

Thirdly, Nightly and Treeherder don't show the performance of individual functions, however the GC is clearly better in Nightly.

Compared to Nightly, the Treeherder builds remove one JavaScript string object cache, so the Treeherder builds are expected to generate more string garbage if you re-decode the same string without having decoded four different strings in between.

How tight is your measurement loop? That is, does your workload re-decode the same string without decoding four different strings in between?

No, it doesn't do anything with the strings, it just calls TextDecoder.decode and then immediately passes it to the DOM APIs.

Thanks. This kind of pattern doesn't avoid inflation to UTF-16 and just defers the inflation in the case of the Treeherder builds. However, for short strings (15 or shorter is one kind of short, 23 or shorter is another kind of short), the Treeherder builds avoid an extra inflate/deflate cycle and in the ASCII case avoid a malloc.

Yes, they're ASCII, though in the case of textContent the biggest string is 28 characters long.

Thanks. This could benefit from further optimization opportunities.

@aminya
Copy link

aminya commented May 16, 2020

There is a similar issue with JSOX and JSOX-WASM. The wasm implementation that is written in C++ is slower than the JavaScript code because of the string conversion overhead.
d3x0r/jsox-wasm#1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants