-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(util): attempt of performance improve on MIMEType class #46607
base: main
Are you sure you want to change the base?
Conversation
fe82488
to
c6e4132
Compare
lib/internal/mime.js
Outdated
|
||
function escapeQuoteOrSolidus(str) { | ||
let result = ''; | ||
for (let i = 0; i < str.length; i++) { | ||
const char = str[i]; | ||
result += (char === '"' || char === '\\') ? `\\${char}` : char; | ||
const code = StringPrototypeCharCodeAt(char, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Ideally, if we don't have quote or backslash character, we don't need to create a result string and append it for each character. I don't know the probability of having this happy path, but if it's common, we might just check on the beginning and return the str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had in mind using the Regex#matchAll
but it requires having the Regex marked as global (making them stateful), which can make usages of Regex tricky. Matter of tradeoffs maybe?
Though, this only affects while using the toString
method of the class (template literals, concatenation, etc.). At parsing is not involved at all 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RexExp.prototype.matchAll
creates a RegExpStringIterator
object, and it has all kinds of prototype pollution vulnerabilities. I don't think it's the right tool for lib/
(and doubt it would be more performant than the alternatives).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only performant at the extent of the regex iterating over the string at once and having all the indices where this applies, so the escaping does not need to iterate once again over the string.
But I agree that the security side it is more important here 👍
Hi @aduh95, I noticed that Sadly this comes with regressions in the overall performance (~10%). Is there anything we can do to also speed up the Primordial? |
@metcoder95 what are you comparing it with? |
This is the small benchmark I ran: const regex = /foo*/;
const regexTest = /foo*/;
const regexExec = /foo*/;
const str = 'table football, foosball';
suite
.add('RegExp#exec', function () {
regexExec.exec(str)[0];
})
.add('RegExp#test', function () {
regexTest.test(str);
})
.add('String#search', function () {
str.search(regex);
})
.add('String#match', function () {
str.match(regex);
})
.on('cycle', function (event) {
console.log(String(event.target))
})
.on('complete', function () {
console.log('Fastest is ' + this.filter('fastest').map('name'))
})
.run({ async: true }) Results: RegExp#exec x 37,341,962 ops/sec ±0.24% (96 runs sampled)
RegExp#test x 66,946,944 ops/sec ±0.09% (98 runs sampled)
String#search x 53,180,141 ops/sec ±0.06% (93 runs sampled)
String#match x 36,475,617 ops/sec ±0.18% (99 runs sampled) The benchmark is pretty rudimentary but it highlights a significative difference. |
You should also compare the primordials to have a more complete benchmark. You can access them if you use
You should also test adding |
1 similar comment
You should also compare the primordials to have a more complete benchmark. You can access them if you use
You should also test adding |
You can also add the case where we use |
Sure! I made the changes suggested, and the results didn't vary that much: RegExp#exec x 37,376,162 ops/sec ±0.25% (92 runs sampled)
RegExp#test x 66,952,599 ops/sec ±0.07% (98 runs sampled)
String#search x 53,146,898 ops/sec ±0.09% (98 runs sampled)
String#match x 35,598,485 ops/sec ±0.26% (98 runs sampled) const {
RegExpPrototypeExec,
RegExpPrototypeTest,
StringPrototypeSearch,
StringPrototypeMatch,
} = require('internal/test/binding').primordials;
const Benchmark = require('benchmark')
const suite = new Benchmark.Suite()
const regex = /foo*/;
const regexTest = /foo*/;
const regexExec = /foo*/;
const str = 'table football, foosball';
suite
.add('RegExp#exec', function () {
RegExpPrototypeExec(regexExec, str) === null;
})
.add('RegExp#test', function () {
RegExpPrototypeTest(regexTest, str) === false;
})
.add('String#search', function () {
StringPrototypeSearch(str, regex) === -1;
})
.add('String#match', function () {
StringPrototypeMatch(str, regex) === null;
})
.on('cycle', function (event) {
console.log(String(event.target))
})
.on('complete', function () {
console.log('Fastest is ' + this.filter('fastest').map('name'))
})
.run({ async: true }) This is actually nice, it seems that Primordials doesn't add any overhead |
So according to your results, |
I made the changes accordingly and it had a reduction of ~5% which stills marks an improvement.
Though, I'm having a little bit of struggles understanding what's wrong with the testing and why are they timing out, any guidance would be super helpful 😅 |
I reached out from OpenJSF Slack |
Thanks @anonrig, I actually thing found the culprit. It seems a bad parsing of a given MIME is causing the issue, I'll look through the day 👍 |
It's probably worth trying out |
Thanks, let me give it a try and I'll share the results as well 👍 |
I found this line that hardens |
Of course you can, but it has terrible performance. |
Well, was worth it to ask 🤷♂️ |
FYI here are the results I've found:
Code'use strict';
const {
RegExpPrototypeExec,
RegExpPrototypeTest,
RegExpPrototypeSymbolSearch,
SafeStringPrototypeSearch,
hardenRegExp,
makeSafe,
} = require('internal/test/binding').primordials;
const Benchmark = require('benchmark')
const SafeRegExp = makeSafe(RegExp,
class SafeRegExp extends RegExp {}
)
const suite = new Benchmark.Suite();
const regex = /foo*/;
const regexTest = /foo*/;
const regexHardenedTest = hardenRegExp(/foo*/);
const safeRegexTest = new SafeRegExp(regexHardenedTest);
const regexExec = /foo*/;
const str = 'table football, foosball';
suite
.add('RegExpPrototypeExec', function () {
RegExpPrototypeExec(regexExec, str) === null;
})
.add('RegExpPrototypeTest', function () {
RegExpPrototypeTest(regexTest, str) === false;
})
.add('RegExp#test', function () {
regexTest.test(str) === false;
})
.add('RegExpPrototypeTest+hardenedRegExp', function () {
RegExpPrototypeTest(regexHardenedTest, str) === false;
})
.add('SafeRegExp#test', function () {
safeRegexTest.test(str) === false;
})
.add('SafeStringPrototypeSearch', function () {
SafeStringPrototypeSearch(str, regex) === -1;
})
.add('RegExpPrototypeSymbolSearch', function () {
RegExpPrototypeSymbolSearch(regex, str) === -1;
})
.on('cycle', function (event) {
console.log(String(event.target))
})
.on('complete', function () {
console.log('Fastest is ' + this.filter('fastest').map('name'))
})
.run({ async: true }); AFAICT, |
Agree, the changes I made reflected ~5% regression, but they were worth it compared to the progression made. I just have struggled with some scenarios that need adjustments. Hope can have something by eow 👍 |
Update on this PR Sadly haven't had the time to properly focus on this, might have some next week but has to be seen 🙁 Solved the timeout issue, but the edge cases are the ones to cover now. Feel free to have a look and I'll incorporate feedback if any. Thanks! 🙂 |
name = `${name}`; | ||
value = `${value}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why though?
can value and name be not a string?
maybe creating a string only if it is not a string is faster.
name = `${name}`; | |
value = `${value}`; | |
typeof name !== 'string' && (name = `${name}`); | |
typeof value !== 'string'&& (value = `${value}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but I believe that it might be to trigger the toString
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why would you need that on a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine that is in case the value passed isn't a string
so we enforce the primitive representation or whatever toString
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe creating a string only if it is not a string is faster.
I doubt it, V8 should already have it optimized, I would expect doing the test manually as suggested above would be slower than not doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already added most of the points, and beyond some allocations that makes a substantial difference (~10%), most of the string
optimizations showed negligible improvements.
I'll just check the tests passed as before and will push the changes for review
@@ -182,103 +298,103 @@ class MIMEParams { | |||
for (const { 0: key, 1: value } of this.#data) { | |||
const encoded = encode(value); | |||
// Ensure they are separated | |||
if (ret.length) ret += ';'; | |||
if (ret.length !== 0) ret += ';'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ret.length !== 0) ret += ';'; | |
ret.length !== 0 && (ret += ';'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe even
ret.length !== 0 && (ret += `;${key}=${encoded}`) || (ret += `${key}=${encoded}`);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 on that suggestion, the version with if
is much more readable.
continue; | ||
} | ||
const endOfSource = str.length; | ||
const marker = { position }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is marker an object, because we want to keep it as byRef parameter?, needs comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the main reason. To use as a checkpoint record for the iterations
StringPrototypeSlice(str, marker.position, valueEnd); | ||
|
||
// #11-9-2 Remove any trailing HTTP whitespace from parameterValue. | ||
const trimmedValue = StringPrototypeTrimEnd(rawValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe first check if the last character is a whitespace and then call if so trimEnd.
marker.position += trimmedValue.length; | ||
|
||
// #11-9-3 If parameterValue is the empty string, then continue. | ||
if (trimmedValue.length === 0) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch positions as if trimmedValue.length === 0 means that marker.position += trimmedValue.length is doing nothing.
marker.position += trimmedValue.length; | |
// #11-9-3 If parameterValue is the empty string, then continue. | |
if (trimmedValue.length === 0) continue; | |
// #11-9-3 If parameterValue is the empty string, then continue. | |
if (trimmedValue.length === 0) continue; | |
marker.position += trimmedValue.length; | |
SafeStringPrototypeSearch( | ||
parameterValue, | ||
NOT_HTTP_QUOTED_STRING_CODE_POINT, | ||
) === -1 && | ||
params.has(parameterName) === false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isnt .has on Map O(1)?
SafeStringPrototypeSearch( | |
parameterValue, | |
NOT_HTTP_QUOTED_STRING_CODE_POINT, | |
) === -1 && | |
params.has(parameterName) === false | |
params.has(parameterName) === false && | |
SafeStringPrototypeSearch( | |
parameterValue, | |
NOT_HTTP_QUOTED_STRING_CODE_POINT, | |
) === -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I'm afraid I'm not following 🤔
this.#type = data.type; | ||
this.#subtype = data.subtype; | ||
// #1 Remove any leading and trailing HTTP whitespace from input | ||
string = StringPrototypeTrim(`${string}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string = StringPrototypeTrim(`${string}`); | |
typeof string !== 'string' && (string = `${string}`) | |
string = StringPrototypeTrim(string); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%String.prototype.trim%
always call ToString
per spec, it shouldn't be necessary to to convert it first.
string = StringPrototypeTrim(`${string}`); | |
string = StringPrototypeTrim(string); |
const { type, subtype, parametersStringIndex } = parseTypeAndSubtype(string); | ||
this.#type = type; | ||
this.#subtype = subtype; | ||
this.#essence = `${this.#type}/${this.#subtype}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.#essence = `${this.#type}/${this.#subtype}`; | |
this.#essence = `${type}/${subtype}`; |
const invalidTypeIndex = SafeStringPrototypeSearch(v, NOT_HTTP_TOKEN_CODE_POINT); | ||
if (v.length === 0 || invalidTypeIndex !== -1) { | ||
if (invalidTypeIndex !== -1 || v.length === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this maybe better?
if (v.length === 0) {
throw new ERR_INVALID_MIME_SYNTAX('type', v, -1);
}
const invalidTypeIndex = SafeStringPrototypeSearch(v, NOT_HTTP_TOKEN_CODE_POINT);
if (invalidTypeIndex !== -1) {
throw new ERR_INVALID_MIME_SYNTAX('type', v, invalidTypeIndex);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can even merge the SafeStringPrototypeSearch
within the if
statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to pass the invalidTypeIndex to the error, then you have to do it outside the if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some thoughts. maybe you can press some op/s out of my suggestions. looking for feedback
Btw. If you use benchmarkjs for benching and use async true (for what reason ever) then you could set delay to 0. |
Thanks for the feedback @Uzlopak, they are a fresh look over the PR about several optimisations a missed 👌 |
This PR got me actually the motivation to clone the node repo and try to get it built and run some benchmarks |
583249d
to
3394c74
Compare
I'm re-taking this PR. I'm giving it a try with the recommendations from @Uzlopak and doing some experiments with direct buffers to see if we can speed this ahead since we start parsing the MIMEType string. I'd hope to have something by end of week 🙂 |
Implements performance#38
The PR makes an attempt to improve the performance of the
MIMEType
class at the moment of parsing an incomingMIME
string.The vast majority of changes were made in an attempt of preserving current implementation meanwhile pushing the limits as far as possible.
As shown in the performance#38 comment, the improvements were about 12 - 20% on average.
Happy to see any feedback on possible points I missed or further improvements that can be made 🙂
cc: @anonrig