Skip to content
This repository was archived by the owner on May 20, 2022. It is now read-only.

Concerns about ReDoS #26

Open
RunDevelopment opened this issue Jul 4, 2021 · 1 comment
Open

Concerns about ReDoS #26

RunDevelopment opened this issue Jul 4, 2021 · 1 comment

Comments

@RunDevelopment
Copy link

I want to ask whether this proposal will include any strategies to combat/prevent super-linear backtracking caused by sequences?

Right now, Unicode property escapes (UPE) are atomic (as are all characters, character sets, and character classes). With this proposal, UPEs evaluating to sequence sets will require backtracking.

This is a big problem because it is not obvious whether a UPE evaluates to a character set (no backtracking) or a sequence set (backtracking). This makes it incredibly easy for developers to (accidentally) create unsafe regexes. This can then be exploited in an ReDoS attack.

Example 1:
Suppose a Unicode property Foo equal to the sequence set xyz|xy|z and a JS RegExp /\p{Foo}+$/u. If this regex gets compiled into something equivalent to /(?:xyz|xy|z)+$/u, then the regex will be susceptible to exponential backtracking for all input strings of the form "xyz".repeat(n) + "a".

Human detection

It is practically impossible for a human to detect whether a UPE will cause super-linear backtracking. Even if a human knows that a particular UPE evaluates to a set of sequences, without knowing the exact set of sequences, it is impossible to determine whether the UPE will cause super-linear backtracking.

Malicious entities can easily exploit this fact, especially in the large JS open-source ecosystem. A malicious contributor can hide exponential backtracking using sequences. Since this is partially impossible for a human to detect, the unsafe regex will likely go unnoticed.

Atomic UPEs

Please note that UPEs evaluating to sequence sets cannot be made atomic. Backtracking is required for correctness, AFAICS.

Example 2:
The regex /\p{Foo}z/u has to be equivalent to /(?:xyz|xy|z)z/u in order to accept the string "xyz". The UPE cannot be atomic.

@mathiasbynens
Copy link
Member

Would you mind opening this issue against the new repo, here: https://github.com/tc39/proposal-regexp-set-notation/issues? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants