Skip to content

Commit ae70b41

Browse files
committed
security: fix denial-of-service bug in compiler
The regex compiler will happily attempt to compile '(?:){294967295}' by compiling the empty sub-expression 294,967,295 times. Empty sub-expressions don't use any memory in the current implementation, so this doesn't trigger the pre-existing machinery for stopping compilation early if the regex object gets too big. The end result is that while compilation will eventually succeed, it takes a very long time to do so. In this commit, we fix this problem by adding a fake amount of memory every time we compile an empty sub-expression. It turns out we were already tracking an additional amount of indirect heap usage via 'extra_inst_bytes' in the compiler, so we just make it look like compiling an empty sub-expression actually adds an additional 'Inst' to the compiled regex object. This has the effect of causing the regex compiler to reject this sort of regex in a reasonable amount of time by default. Many thanks to @VTCAKAVSMoACE for reporting this, providing the valuable test cases and continuing to test this patch as it was developed. Fixes GHSA-m5pq-gvj9-9vr8
1 parent b92ffd5 commit ae70b41

File tree

2 files changed

+95
-2
lines changed

2 files changed

+95
-2
lines changed

src/compile.rs

+25-2
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,16 @@ pub struct Compiler {
3838
suffix_cache: SuffixCache,
3939
utf8_seqs: Option<Utf8Sequences>,
4040
byte_classes: ByteClassSet,
41+
// This keeps track of extra bytes allocated while compiling the regex
42+
// program. Currently, this corresponds to two things. First is the heap
43+
// memory allocated by Unicode character classes ('InstRanges'). Second is
44+
// a "fake" amount of memory used by empty sub-expressions, so that enough
45+
// empty sub-expressions will ultimately trigger the compiler to bail
46+
// because of a size limit restriction. (That empty sub-expressions don't
47+
// add to heap memory usage is more-or-less an implementation detail.) In
48+
// the second case, if we don't bail, then an excessively large repetition
49+
// on an empty sub-expression can result in the compiler using a very large
50+
// amount of CPU time.
4151
extra_inst_bytes: usize,
4252
}
4353

@@ -260,7 +270,7 @@ impl Compiler {
260270

261271
self.check_size()?;
262272
match *expr.kind() {
263-
Empty => Ok(None),
273+
Empty => self.c_empty(),
264274
Literal(hir::Literal::Unicode(c)) => self.c_char(c),
265275
Literal(hir::Literal::Byte(b)) => {
266276
assert!(self.compiled.uses_bytes());
@@ -378,6 +388,19 @@ impl Compiler {
378388
}
379389
}
380390

391+
fn c_empty(&mut self) -> ResultOrEmpty {
392+
// See: https://github.com/rust-lang/regex/security/advisories/GHSA-m5pq-gvj9-9vr8
393+
// See: CVE-2022-24713
394+
//
395+
// Since 'empty' sub-expressions don't increase the size of
396+
// the actual compiled object, we "fake" an increase in its
397+
// size so that our 'check_size_limit' routine will eventually
398+
// stop compilation if there are too many empty sub-expressions
399+
// (e.g., via a large repetition).
400+
self.extra_inst_bytes += std::mem::size_of::<Inst>();
401+
Ok(None)
402+
}
403+
381404
fn c_capture(&mut self, first_slot: usize, expr: &Hir) -> ResultOrEmpty {
382405
if self.num_exprs > 1 || self.compiled.is_dfa {
383406
// Don't ever compile Save instructions for regex sets because
@@ -496,7 +519,7 @@ impl Compiler {
496519
let mut exprs = exprs.into_iter();
497520
let Patch { mut hole, entry } = loop {
498521
match exprs.next() {
499-
None => return Ok(None),
522+
None => return self.c_empty(),
500523
Some(e) => {
501524
if let Some(p) = self.c(e)? {
502525
break p;

tests/test_default.rs

+70
Original file line numberDiff line numberDiff line change
@@ -150,3 +150,73 @@ fn regex_is_reasonably_small() {
150150
assert_eq!(16, size_of::<bytes::Regex>());
151151
assert_eq!(16, size_of::<bytes::RegexSet>());
152152
}
153+
154+
// See: https://github.com/rust-lang/regex/security/advisories/GHSA-m5pq-gvj9-9vr8
155+
// See: CVE-2022-24713
156+
//
157+
// We test that our regex compiler will correctly return a "too big" error when
158+
// we try to use a very large repetition on an *empty* sub-expression.
159+
//
160+
// At the time this test was written, the regex compiler does not represent
161+
// empty sub-expressions with any bytecode instructions. In effect, it's an
162+
// "optimization" to leave them out, since they would otherwise correspond
163+
// to an unconditional JUMP in the regex bytecode (i.e., an unconditional
164+
// epsilon transition in the NFA graph). Therefore, an empty sub-expression
165+
// represents an interesting case for the compiler's size limits. Since it
166+
// doesn't actually contribute any additional memory to the compiled regex
167+
// instructions, the size limit machinery never detects it. Instead, it just
168+
// dumbly tries to compile the empty sub-expression N times, where N is the
169+
// repetition size.
170+
//
171+
// When N is very large, this will cause the compiler to essentially spin and
172+
// do nothing for a decently large amount of time. It causes the regex to take
173+
// quite a bit of time to compile, despite the concrete syntax of the regex
174+
// being quite small.
175+
//
176+
// The degree to which this is actually a problem is somewhat of a judgment
177+
// call. Some regexes simply take a long time to compile. But in general, you
178+
// should be able to reasonably control this by setting lower or higher size
179+
// limits on the compiled object size. But this mitigation doesn't work at all
180+
// for this case.
181+
//
182+
// This particular test is somewhat narrow. It merely checks that regex
183+
// compilation will, at some point, return a "too big" error. Before the
184+
// fix landed, this test would eventually fail because the regex would be
185+
// successfully compiled (after enough time elapsed). So while this test
186+
// doesn't check that we exit in a reasonable amount of time, it does at least
187+
// check that we are properly returning an error at some point.
188+
#[test]
189+
fn big_empty_regex_fails() {
190+
use regex::Regex;
191+
192+
let result = Regex::new("(?:){4294967295}");
193+
assert!(result.is_err());
194+
}
195+
196+
// Below is a "billion laughs" variant of the previous test case.
197+
#[test]
198+
fn big_empty_reps_chain_regex_fails() {
199+
use regex::Regex;
200+
201+
let result = Regex::new("(?:){64}{64}{64}{64}{64}{64}");
202+
assert!(result.is_err());
203+
}
204+
205+
// Below is another situation where a zero-length sub-expression can be
206+
// introduced.
207+
#[test]
208+
fn big_zero_reps_regex_fails() {
209+
use regex::Regex;
210+
211+
let result = Regex::new(r"x{0}{4294967295}");
212+
assert!(result.is_err());
213+
}
214+
215+
// Testing another case for completeness.
216+
#[test]
217+
fn empty_alt_regex_fails() {
218+
use regex::Regex;
219+
220+
let result = Regex::new(r"(?:|){4294967295}");
221+
assert!(result.is_err());
222+
}

0 commit comments

Comments
 (0)