Skip to content

Commit 49ac923

Browse files
committed
Document inability to match lone low surrogates accurately
Ref. mathiasbynens/regexpu#17. Closes #28.
1 parent 9cd1cd7 commit 49ac923

File tree

2 files changed

+7
-3
lines changed

2 files changed

+7
-3
lines changed

README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Regenerate [![Build status](https://travis-ci.org/mathiasbynens/regenerate.svg?branch=master)](https://travis-ci.org/mathiasbynens/regenerate) [![Code coverage status](http://img.shields.io/coveralls/mathiasbynens/regenerate/master.svg)](https://coveralls.io/r/mathiasbynens/regenerate) [![Dependency status](https://gemnasium.com/mathiasbynens/regenerate.svg)](https://gemnasium.com/mathiasbynens/regenerate)
1+
# Regenerate [![Build status](https://travis-ci.org/mathiasbynens/regenerate.svg?branch=master)](https://travis-ci.org/mathiasbynens/regenerate) [![Code coverage status](https://coveralls.io/repos/mathiasbynens/regenerate/badge.svg)](https://coveralls.io/r/mathiasbynens/regenerate) [![Dependency status](https://gemnasium.com/mathiasbynens/regenerate.svg)](https://gemnasium.com/mathiasbynens/regenerate)
22

33
_Regenerate_ is a Unicode-aware regex generator for JavaScript. It allows you to easily generate JavaScript-compatible regular expressions based on a given set of Unicode symbols or code points. (This is trickier than you might think, because of [how JavaScript deals with astral symbols](https://mathiasbynens.be/notes/javascript-unicode).)
44

@@ -243,6 +243,8 @@ lowSurrogates.toString({ 'bmpOnly': true });
243243
// → '[\\uDC00-\\uDFFF]'
244244
```
245245

246+
Note that lone low surrogates cannot be matched accurately using regular expressions in JavaScript. Regenerate’s output makes a best-effort approach but [there can be false negatives in this regard](https://github.com/mathiasbynens/regenerate/issues/28#issuecomment-72224808).
247+
246248
### `regenerate.prototype.toRegExp(flags = '')`
247249

248250
Returns a regular expression that matches all the symbols mapped to the code points within the set. Optionally, you can pass [flags](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp#Parameters) to be added to the regular expression.

regenerate.js

+4-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
var freeModule = typeof module == 'object' && module &&
99
module.exports == freeExports && module;
1010

11-
// Detect free variable `global`, from Node.js or Browserified code,
11+
// Detect free variable `global`, from Node.js/io.js or Browserified code,
1212
// and use it as `root`.
1313
var freeGlobal = typeof global == 'object' && global;
1414
if (freeGlobal.global === freeGlobal || freeGlobal.window === freeGlobal) {
@@ -999,7 +999,9 @@
999999
}
10001000
if (hasLoneLowSurrogates) {
10011001
result.push(
1002-
// Make sure the low surrogates aren’t part of a surrogate pair.
1002+
// It is not possible to accurately assert the low surrogates aren’t
1003+
// part of a surrogate pair, since JavaScript regular expressions do
1004+
// not support lookbehind.
10031005
'(?:[^\\uD800-\\uDBFF]|^)' +
10041006
createBMPCharacterClasses(loneLowSurrogates)
10051007
);

0 commit comments

Comments
 (0)