Skip to content

[Markdown] Fix HTML comment parser. #2121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 3, 2025
Merged

[Markdown] Fix HTML comment parser. #2121

merged 3 commits into from
Jul 3, 2025

Conversation

lrhn
Copy link
Member

@lrhn lrhn commented Jul 2, 2025

See #2119

Fix performance and correctness of HTML comment parser.

RegExp had catastrophic backtracking. Also didn't match the spec if linked to.
(Which is still CM 30, not 31.2, they differ.)

Fixed a few other incorrect parsings.

  • The content of <?...?>, <!a...> and <![CDATA[...]]> can
    contain newlines. Changed . to [^].
  • The <![CDATA[ tag is case sensitive. Changed RegExp to be
    case sensitive, so added A-Z to all the a-zs used in that
    regexp.

lrhn added 2 commits July 2, 2025 15:26
RegExp had catastrophic backtracking.
Also didn't match the spec if linked to.
(Which is still CM 30, not 31.2, they differ.)

Fixed a few other incorrect parsings.
- The content of `<?...?>`, `<!a...>` and `<![CDATA[...]]>` can
  contain newlines. Changed `.` to `[^]`.
- The `<![CDATA[` tag is case sensitive. Changed RegExp to be
  case sensitive, so added `A-Z` to all the `a-z`s used in that
  regexp.
@lrhn lrhn requested review from a team as code owners July 2, 2025 13:30
@lrhn lrhn requested review from devoncarew and removed request for a team July 2, 2025 13:31
Copy link

github-actions bot commented Jul 2, 2025

Package publishing

Package Version Status Publish tag (post-merge)
package:bazel_worker 1.1.3 already published at pub.dev
package:benchmark_harness 2.4.0-wip WIP (no publish necessary)
package:boolean_selector 2.1.2 already published at pub.dev
package:browser_launcher 1.1.3 already published at pub.dev
package:cli_config 0.2.1-wip WIP (no publish necessary)
package:cli_util 0.4.2 already published at pub.dev
package:clock 1.1.3-wip WIP (no publish necessary)
package:code_builder 4.10.2-wip WIP (no publish necessary)
package:coverage 1.14.1 already published at pub.dev
package:csslib 1.0.2 already published at pub.dev
package:extension_discovery 2.1.0 already published at pub.dev
package:file 7.0.2-wip WIP (no publish necessary)
package:file_testing 3.1.0-wip WIP (no publish necessary)
package:glob 2.1.3 already published at pub.dev
package:graphs 2.3.3-wip WIP (no publish necessary)
package:html 0.15.6 already published at pub.dev
package:io 1.1.0-wip WIP (no publish necessary)
package:json_rpc_2 4.0.0 already published at pub.dev
package:markdown 7.3.1-wip WIP (no publish necessary)
package:mime 2.0.0 already published at pub.dev
package:oauth2 2.0.4-wip WIP (no publish necessary)
package:package_config 2.3.0-wip WIP (no publish necessary)
package:pool 1.5.2-wip WIP (no publish necessary)
package:process 5.0.4 already published at pub.dev
package:pub_semver 2.2.0 already published at pub.dev
package:pubspec_parse 1.5.0 already published at pub.dev
package:source_map_stack_trace 2.1.3-wip WIP (no publish necessary)
package:source_maps 0.10.14-wip WIP (no publish necessary)
package:source_span 1.10.1 already published at pub.dev
package:sse 4.1.8 already published at pub.dev
package:stack_trace 1.12.1 already published at pub.dev
package:stream_channel 2.1.4 already published at pub.dev
package:stream_transform 2.1.2-wip WIP (no publish necessary)
package:string_scanner 1.4.1 already published at pub.dev
package:term_glyph 1.2.3-wip WIP (no publish necessary)
package:test_reflective_loader 0.3.0 ready to publish test_reflective_loader-v0.3.0
package:timing 1.0.2 already published at pub.dev
package:unified_analytics 8.0.3 ready to publish unified_analytics-v8.0.3
package:watcher 1.1.2 already published at pub.dev
package:yaml 3.1.3 already published at pub.dev
package:yaml_edit 2.2.2 already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

Copy link

github-actions bot commented Jul 2, 2025

PR Health

Breaking changes ⚠️
Package Change Current Version New Version Needed Version Looking good?
markdown Breaking 7.3.0 7.3.1-wip 8.0.0
Got "7.3.1-wip" expected >= "8.0.0" (breaking changes)
⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

Coverage ✔️
File Coverage
pkgs/markdown/lib/src/inline_syntaxes/inline_html_syntax.dart 💚 100 %
pkgs/markdown/lib/src/patterns.dart 💚 100 %

This check for test coverage is informational (issues shown here will not fail the PR).

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/bazel_worker/example/client.dart
pkgs/bazel_worker/example/worker.dart
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart
pkgs/boolean_selector/example/example.dart
pkgs/clock/lib/clock.dart
pkgs/clock/lib/src/clock.dart
pkgs/clock/lib/src/default.dart
pkgs/clock/lib/src/stopwatch.dart
pkgs/clock/lib/src/utils.dart
pkgs/clock/test/clock_test.dart
pkgs/clock/test/default_test.dart
pkgs/clock/test/stopwatch_test.dart
pkgs/clock/test/utils.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/html/example/main.dart
pkgs/html/lib/dom.dart
pkgs/html/lib/dom_parsing.dart
pkgs/html/lib/html_escape.dart
pkgs/html/lib/parser.dart
pkgs/html/lib/src/constants.dart
pkgs/html/lib/src/encoding_parser.dart
pkgs/html/lib/src/html_input_stream.dart
pkgs/html/lib/src/list_proxy.dart
pkgs/html/lib/src/query_selector.dart
pkgs/html/lib/src/token.dart
pkgs/html/lib/src/tokenizer.dart
pkgs/html/lib/src/treebuilder.dart
pkgs/html/lib/src/utils.dart
pkgs/html/test/dom_test.dart
pkgs/html/test/parser_feature_test.dart
pkgs/html/test/parser_test.dart
pkgs/html/test/query_selector_test.dart
pkgs/html/test/selectors/level1_baseline_test.dart
pkgs/html/test/selectors/level1_lib.dart
pkgs/html/test/selectors/selectors.dart
pkgs/html/test/support.dart
pkgs/html/test/tokenizer_test.dart
pkgs/html/test/trie_test.dart
pkgs/html/tool/generate_trie.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/stack_trace/example/example.dart
pkgs/watcher/test/custom_watcher_factory_test.dart
pkgs/yaml_edit/example/example.dart

@coveralls
Copy link

Pull Request Test Coverage Report for Build 16026605421

Details

  • 1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+2.3%) to 96.114%

Totals Coverage Status
Change from base Build 15949744654: 2.3%
Covered Lines: 1558
Relevant Lines: 1621

💛 - Coveralls


InlineHtmlSyntax()
: super(_pattern, startCharacter: $lt, caseSensitive: false);
: super(_pattern, startCharacter: $lt, caseSensitive: true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <![CDATA[ tag is case sensitive, so the RegExp should be too.
Added A-Z to all the a-zs used here and in namedTagDefinition.

@lrhn lrhn merged commit 6282b35 into main Jul 3, 2025
16 checks passed
@lrhn lrhn deleted the md-html-multiline-comment branch July 3, 2025 08:55
final html = markdownToHtml(input); // Should not hang.
expect(html, isNotNull); // To use the output.
final elapsed = time.elapsedMilliseconds;
expect(elapsed, lessThan(10000));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected runtime now? Is there a big enough margin for low powered machines running this test?

Idea: Rather than using a fixed timeout, we measure 1, 2, 3, and 4 paragraphs and verify that the runtime grows roughly linearly instead of exponentially?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be linear in the size of the input like all the other RegExps. Which means it'll drown in the noise of everything else that is being done.

In inserted a print(elapsed); in the test, with four - lines, and ran it five times. It took times in the range 115-130 ms.
Then I added 16 more entries, 5-doubling the size and ran the test again. That took 133-162 ms.
For the heck of it, I 5-doubled it again, adding 80 more entries, and it now took 142-186 ms.

The time taken by running that RegExp is trivial. I'd guess compiling it at runtime takes longer than running it on any reasonable example.

But why guess when I can check it!
I put a for (var i = 0; i < 2; i++){...} around the code of the test, to run it again after the RegExp has already been compiled.
The first run of each test takes the same kind of time 137-183 ms.
The second run takes 4-14 ms. For the 100-entry sample text.

Running the RegExp is not taking any significant time. Checking that it is linear is going to take a lot of size, 1000+ lines, before it's even measurably above noise-level (4-14 ms is a 3.5x variance).

I'm not worried. This test just checks that we don't revert the RegExp accidentally.

copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request Jul 8, 2025
Revisions updated by `dart tools/rev_sdk_deps.dart`.

ai (https://github.com/dart-lang/ai/compare/64dfa7f..9b007b3):
  9b007b3  2025-07-07  Jacob MacDonald  Add failure reasons to tool call analytics events (dart-lang/ai#219)
  c8dc5da  2025-07-07  Jacob MacDonald  don't bail early when running in multiple roots (dart-lang/ai#218)
  2541b6c  2025-07-02  Kenzie Davisson  Remove VS Code mcp instructions in favor of Dart-Code setting. (dart-lang/ai#206)
  70daa1f  2025-07-02  Jacob MacDonald  release dart_mcp 0.3.0 (dart-lang/ai#216)
  a252a46  2025-07-01  Jacob MacDonald  add retry logic to try and make dtd_test less flaky (dart-lang/ai#214)
  9e0b973  2025-07-01  Jacob MacDonald  add a test that the arg parser library only depends on package:args (dart-lang/ai#213)

http (https://github.com/dart-lang/http/compare/e70a41b..7d2d87e):
  7d2d87e  2025-07-02  Brian Quinlan  Fix `Connection reset by peer` in protocol error tests (dart-lang/http#1786)

i18n (https://github.com/dart-lang/i18n/compare/ab90327..42c4932):
  42c49328  2025-07-07  Googler  No public description
  87fd0156  2025-07-07  Michael Goderbauer  [intl4x] Re-enable Windows (dart-lang/i18n#986)
  912a7720  2025-07-07  Copybara-Service  Merge pull request `#985` from dart-lang:fixConstantEvaluator
  52f5beeb  2025-07-07  Moritz  Small cleanups in intl4x (dart-lang/i18n#988)
  6e8ef245  2025-07-07  Moritz  squash

sync_http (https://github.com/dart-lang/sync_http/compare/dc54465..c07f96f):
  c07f96f  2025-07-03  Kevin Moore  Update to latest lints, required Dart 3.7 (google/sync_http.dart#55)

tools (https://github.com/dart-lang/tools/compare/7bf22c9..6282b35):
  6282b35e  2025-07-03  Lasse R.H. Nielsen  [Markdown] Fix HTML comment parser. (dart-lang/tools#2121)

web (https://github.com/dart-lang/web/compare/3e11172..fb8a149):
  fb8a149  2025-07-07  Nikechukwu  Add Support for Configuration of Dart JS Interop Gen (dart-lang/web#386)

Change-Id: Ib243021ed77846a8451f60fa320e5cf40e85aa27
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/439320
Commit-Queue: Konstantin Shcheglov <[email protected]>
Auto-Submit: Devon Carew <[email protected]>
Reviewed-by: Konstantin Shcheglov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants