-
Notifications
You must be signed in to change notification settings - Fork 60
[Markdown] Fix HTML comment parser. #2121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file | ||
// for details. All rights reserved. Use of this source code is governed by a | ||
// BSD-style license that can be found in the LICENSE file. | ||
|
||
import 'package:markdown/markdown.dart'; | ||
import 'package:test/test.dart'; | ||
|
||
void main() { | ||
test('HTML comment with dashes #2119', () { | ||
// See https://dartbug.com/tools/2119. | ||
// For this issue, the leading letter was needed, | ||
// an HTML comment starting a line is handled by a different path. | ||
// The empty line before the `-->` is needed. | ||
// The number of lines increase time exponentially. | ||
// The length of lines affect the base of the exponentiation. | ||
// Locally, three "Lorem-ipsum" lines ran in ~6 seconds, two in < 200 ms. | ||
// Adding a fourth line should ensure it cannot possibly finish in ten | ||
// seconds if the bug isn't fixed. | ||
const input = ''' | ||
a <!-- | ||
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod | ||
tempor incididunt ut labore et dolore magna aliqua. | ||
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi | ||
ut aliquip ex ea commodo consequat. | ||
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod | ||
tempor incididunt ut labore et dolore magna aliqua. | ||
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi | ||
ut aliquip ex ea commodo consequat. | ||
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod | ||
tempor incididunt ut labore et dolore magna aliqua. | ||
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi | ||
ut aliquip ex ea commodo consequat. | ||
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod | ||
tempor incididunt ut labore et dolore magna aliqua. | ||
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi | ||
ut aliquip ex ea commodo consequat. | ||
--> | ||
'''; | ||
|
||
final time = Stopwatch()..start(); | ||
final html = markdownToHtml(input); // Should not hang. | ||
expect(html, isNotNull); // To use the output. | ||
final elapsed = time.elapsedMilliseconds; | ||
expect(elapsed, lessThan(10000)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the expected runtime now? Is there a big enough margin for low powered machines running this test? Idea: Rather than using a fixed timeout, we measure 1, 2, 3, and 4 paragraphs and verify that the runtime grows roughly linearly instead of exponentially? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be linear in the size of the input like all the other RegExps. Which means it'll drown in the noise of everything else that is being done. In inserted a The time taken by running that RegExp is trivial. I'd guess compiling it at runtime takes longer than running it on any reasonable example. But why guess when I can check it! Running the RegExp is not taking any significant time. Checking that it is linear is going to take a lot of size, 1000+ lines, before it's even measurably above noise-level (4-14 ms is a 3.5x variance). I'm not worried. This test just checks that we don't revert the RegExp accidentally. |
||
}); | ||
|
||
test('HTML comment with lt/gt', () { | ||
// Incorrect parsing found as part of fixing #2119. | ||
// Now matches `<!--$text-->` where text | ||
// does not start with `>` or `->`, does not end with `-`, | ||
// and does not contain `--`. | ||
const input = 'a <!--<->>-<!-->'; | ||
final html = markdownToHtml(input); | ||
expect(html, '<p>$input</p>\n'); | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
<![CDATA[
tag is case sensitive, so the RegExp should be too.Added
A-Z
to all thea-z
s used here and innamedTagDefinition
.