Skip to content

Commit 14ea680

Browse files
committed
Merge branch 'release/6.0.0-b1'
2 parents 91eea92 + 3a1ae6b commit 14ea680

File tree

95 files changed

+2993
-737
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

95 files changed

+2993
-737
lines changed

CMakeLists.txt

+8-6
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ cmake_minimum_required (VERSION 2.6)
88
set (My_Project_Title "MultiMarkdown")
99
set (My_Project_Description "Lightweight markup processor to produce HTML, LaTeX, and more.")
1010
set (My_Project_Author "Fletcher T. Penney")
11-
set (My_Project_Revised_Date "2017-03-05")
12-
set (My_Project_Version_Major 0)
13-
set (My_Project_Version_Minor 4)
14-
set (My_Project_Version_Patch 2b)
11+
set (My_Project_Revised_Date "2017-03-09")
12+
set (My_Project_Version_Major 6)
13+
set (My_Project_Version_Minor 0)
14+
set (My_Project_Version_Patch -b1)
1515

1616
set (My_Project_Version "${My_Project_Version_Major}.${My_Project_Version_Minor}.${My_Project_Version_Patch}")
1717

@@ -172,6 +172,7 @@ configure_file (
172172

173173
# src_files are the primary files, and will be included in doxygen documentation
174174
set(src_files
175+
Sources/libMultiMarkdown/aho-corasick.c
175176
Sources/libMultiMarkdown/beamer.c
176177
Sources/libMultiMarkdown/char.c
177178
Sources/libMultiMarkdown/d_string.c
@@ -194,6 +195,7 @@ set(src_files
194195

195196
# Primary header files, also for doxygen documentation
196197
set(header_files
198+
Sources/libMultiMarkdown/aho-corasick.h
197199
Sources/libMultiMarkdown/beamer.h
198200
Sources/libMultiMarkdown/char.h
199201
Sources/libMultiMarkdown/include/d_string.h
@@ -567,6 +569,6 @@ ADD_MMD_TEST(mmd-6-latex "-t latex" MMD6Tests tex)
567569

568570
ADD_MMD_TEST(mmd-6-odf "-t odf" MMD6Tests fodt)
569571

570-
ADD_MMD_TEST(pathologic "" ../build html)
571-
572572
ADD_MMD_TEST(pathologic-compat "-c" ../build html)
573+
574+
ADD_MMD_TEST(pathologic "" ../build html)

QuickStart.fodt

+545
Large diffs are not rendered by default.

QuickStart.html

+277
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<meta charset="utf-8"/>
5+
<title>MultiMarkdown v6 Quick Start Guide</title>
6+
<meta name="author" content="Fletcher T. Penney"/>
7+
<meta name="version" content="6.0-b"/>
8+
</head>
9+
<body>
10+
11+
<div class="TOC">
12+
13+
<ul>
14+
<li><a href="#introduction">Introduction </a></li>
15+
<li><a href="#performance">Performance </a></li>
16+
<li><a href="#parsetree">Parse Tree </a></li>
17+
<li><a href="#features">Features </a>
18+
<ul>
19+
<li><a href="#abbreviationsoracronyms">Abbreviations (Or Acronyms) </a></li>
20+
<li><a href="#citations">Citations </a></li>
21+
<li><a href="#criticmarkup">CriticMarkup </a></li>
22+
<li><a href="#emphandstrong">Emph and Strong </a></li>
23+
<li><a href="#fencedcodeblocks">Fenced Code Blocks </a></li>
24+
<li><a href="#glossaryterms">Glossary Terms </a></li>
25+
<li><a href="#internationalization">Internationalization </a></li>
26+
<li><a href="#metadata">Metadata </a></li>
27+
<li><a href="#tableofcontents">Table of Contents </a></li>
28+
</ul>
29+
</li>
30+
<li><a href="#futuresteps">Future Steps </a></li>
31+
</ul>
32+
</div>
33+
34+
<h3 id="introduction">Introduction </h3>
35+
36+
<p>Version: 6.0-b</p>
37+
38+
<p>This document serves as a description of MultiMarkdown (<abbr title="MultiMarkdown">MMD</abbr>) v6, as well as a sample
39+
document to demonstrate the various features. Specifically, differences from
40+
<abbr title="MultiMarkdown">MMD</abbr> v5 will be pointed out.</p>
41+
42+
<h3 id="performance">Performance </h3>
43+
44+
<p>A big motivating factor leading to the development of <abbr title="MultiMarkdown">MMD</abbr> v6 was
45+
performance. When <abbr title="MultiMarkdown">MMD</abbr> first migrated from Perl to C (based on <a href="https://github.com/jgm/peg-markdown">peg-
46+
markdown</a>), it was among the fastest
47+
Markdown parsers available. That was many years ago, and the &#8220;competition&#8221;
48+
has made a great deal of progress since that time.</p>
49+
50+
<p>When developing <abbr title="MultiMarkdown">MMD</abbr> v6, one of my goals was to keep <abbr title="MultiMarkdown">MMD</abbr> at least in the
51+
ballpark of the fastest processors. Of course, being <em>the</em> fastest would be
52+
fantastic, but I was more concerned with ensuring that the code was easily
53+
understood, and easily updated with new features in the future.</p>
54+
55+
<p><abbr title="MultiMarkdown">MMD</abbr> v3 &#8211; v5 used a <a href="#gn:1" id="gnref:1" title="see glossary" class="glossary">PEG</a> to handle the parsing. This made it easy to
56+
understand the relationship between the <abbr title="MultiMarkdown">MMD</abbr> grammar and the parsing code,
57+
since they were one and the same. However, the parsing code generated by
58+
the parsers was not particularly fast, and was prone to troublesome edge
59+
cases with terrible performance characteristics.</p>
60+
61+
<p>The first step in <abbr title="MultiMarkdown">MMD</abbr> v6 parsing is to break the source text into a series
62+
of tokens, which may consist of plain text, whitespace, or special characters
63+
such as &#8216;*&#8217;, &#8216;[&#8217;, etc. This chain of tokens is then used to perform the
64+
actual parsing.</p>
65+
66+
<p><abbr title="MultiMarkdown">MMD</abbr> v6 divides the parsing into two separate phases, which actually fits
67+
more with Markdown&#8217;s design philosophically.</p>
68+
69+
<ol>
70+
<li><p>Block parsing consists of identifying the &#8220;type&#8221; of each line of the
71+
source text, and grouping the lines into blocks (e.g. paragraphs, lists,
72+
blockquotes, etc.) Some blocks are a single line (e.g. ATX headers), and
73+
others can be many lines long. The block parsing in <abbr title="MultiMarkdown">MMD</abbr> v6 is handled
74+
by a parser generated by <a href="http://www.hwaci.com/sw/lemon/">lemon</a>. This
75+
parser allows the block structure to be more readily understood by
76+
non-programmers, but the generated parser is still fast.</p></li>
77+
<li><p>Span parsing consists of identifying Markdown/<abbr title="MultiMarkdown">MMD</abbr> structures that occur
78+
inside of blocks, such as links, images, strong, emph, etc. Most of these
79+
structures require matching pairs of tokens to specify where the span starts
80+
and where it ends. Most of these spans allow arbitrary levels of nesting as
81+
well. This made parsing them correctly in the PEG-based code difficult and
82+
slow. <abbr title="MultiMarkdown">MMD</abbr> v6 uses a different approach that is accurate and has good
83+
performance characteristics even with edge cases. Basically, it keeps a stack
84+
of each &#8220;opening&#8221; token as it steps through the token chain. When a &#8220;closing&#8221;
85+
token is found, it is paired with the most recent appropriate opener on the
86+
stack. Any tokens in between the opener and closer are removed, as they are
87+
not able to be matched any more. To avoid unnecessary searches for non-
88+
existent openers, the parser keeps track of which opening tokens have been
89+
discovered. This allows the parser to continue moving forwards without having
90+
to go backwards and re-parse any previously visited tokens.</p></li>
91+
</ol>
92+
93+
<p>The result of this redesigned <abbr title="MultiMarkdown">MMD</abbr> parser is that it can parse short
94+
documents more quickly than <a href="http://commonmark.org/">CommonMark</a>, and takes
95+
only 15% &#8211; 20% longer to parse long documents. I have not delved too deeply
96+
into this, but I presume that CommonMark has a bit more &#8220;set-up&#8221; time that
97+
becomes expensive when parsing a short document (e.g. a paragraph or two). But
98+
this cost becomes negligible when parsing longer documents (e.g. file sizes of
99+
1 MB). So depending on your use case, CommonMark may well be faster than
100+
<abbr title="MultiMarkdown">MMD</abbr>, but we&#8217;re talking about splitting hairs here&#8230;. Recent comparisons
101+
show <abbr title="MultiMarkdown">MMD</abbr> v6 taking approximately 4.37 seconds to parse a 108 MB file
102+
(approximately 24.8 MB/second), and CommonMark took 3.72 seconds for the same
103+
file (29.2 MB/second). For comparison, <abbr title="MultiMarkdown">MMD</abbr> v5.4 took approximately 94
104+
second for the same file (1.15 MB/second).</p>
105+
106+
<p>For a more realistic file of approx 28 kb (the source of the Markdown Syntax
107+
web page), both <abbr title="MultiMarkdown">MMD</abbr> and CommonMark parse it too quickly to accurately
108+
measure. In fact, it requires a file consisting of the original file copied
109+
32 times over (0.85 MB) before <code>/usr/bin/env time</code> reports a time over the
110+
minimum threshold of 0.01 seconds for either program.</p>
111+
112+
<p>There is still potentially room for additional optimization in <abbr title="MultiMarkdown">MMD</abbr>.
113+
However, even if I can&#8217;t close the performance gap with CommonMark on longer
114+
files, the additional features of <abbr title="MultiMarkdown">MMD</abbr> compared with Markdown in addition to
115+
the increased legibility of the source code of <abbr title="MultiMarkdown">MMD</abbr> (in my biased opinion
116+
anyway) make this project worthwhile.</p>
117+
118+
<h3 id="parsetree">Parse Tree </h3>
119+
120+
<p><abbr title="MultiMarkdown">MMD</abbr> v6 performs its parsing in the following steps:</p>
121+
122+
<ol>
123+
<li><p>Start with a null-terminated string of source text (C style string)</p></li>
124+
<li><p>Lex string into token chain</p></li>
125+
<li><p>Parse token chain into blocks</p></li>
126+
<li><p>Parse tokens within each block into span level structures (e.g. strong,
127+
emph, etc.)</p></li>
128+
<li><p>Export the token tree into the desired output format (e.g. HTML, LaTeX,
129+
etc.) and return the resulting C style string</p>
130+
131+
<p><strong>OR</strong></p></li>
132+
<li><p>Use the resulting token tree for your own purposes.</p></li>
133+
</ol>
134+
135+
<p>The token tree (<a href="#gn:2" id="gnref:2" title="see glossary" class="glossary">AST</a>) includes starting offsets and length of each token,
136+
allowing you to use <abbr title="MultiMarkdown">MMD</abbr> as part of a syntax highlighter. <abbr title="MultiMarkdown">MMD</abbr> v5 did not
137+
have this functionality in the public version, in part because the PEG parsers
138+
used did not provide reliable offset positions, requiring a great deal of
139+
effort when I adapted MMD for use in <a href="http://multimarkdown.com/">MultiMarkdown
140+
Composer</a>.</p>
141+
142+
<p>These steps are managed using the <code>mmd_engine</code> &#8220;object&#8221;. An individual
143+
<code>mmd_engine</code> cannot be used by multiple threads simultaneously, so if
144+
libMultiMarkdown is to be used in a multithreaded program, a separate
145+
<code>mmd_engine</code> should be created for each thread. Alternatively, just use the
146+
slightly more abstracted <code>mmd_convert_string()</code> function that handles creating
147+
and destroying the <code>mmd_engine</code> automatically.</p>
148+
149+
<h3 id="features">Features </h3>
150+
151+
<h4 id="abbreviationsoracronyms">Abbreviations (Or Acronyms) </h4>
152+
153+
<p>This file includes the use of <abbr title="MultiMarkdown">MMD</abbr> as an abbreviation for MultiMarkdown. The
154+
abbreviation will be expanded on the first use, and the shortened form will be
155+
used on subsequent occurrences.</p>
156+
157+
<p>Abbreviations can be specified using inline or reference syntax. The inline
158+
variant requires that the abbreviation be wrapped in parentheses and
159+
immediately follows the <code>&gt;</code>.</p>
160+
161+
<pre><code>[>MMD] is an abbreviation. So is [>(MD) Markdown].
162+
163+
[>MMD]: MultiMarkdown
164+
</code></pre>
165+
166+
<h4 id="citations">Citations </h4>
167+
168+
<p>Citations can be specified using an inline syntax, just like inline footnotes.</p>
169+
170+
<h4 id="criticmarkup">CriticMarkup </h4>
171+
172+
<p><abbr title="MultiMarkdown">MMD</abbr> v6 has improved support for <a href="http://criticmarkup.com/">CriticMarkup</a>, both in terms of parsing, and
173+
in terms of support for each output format. You can <ins>insert text</ins>,
174+
<del>delete text</del>, substitute <del>one thing</del><ins>for another</ins>, <mark>highlight text</mark>,
175+
and <span class="critic comment">leave comments</span> in the text.</p>
176+
177+
<h4 id="emphandstrong">Emph and Strong </h4>
178+
179+
<p>The basics of emphasis and strong emphasis are unchanged, but the parsing
180+
engine has been improved to be more accurate, particularly in various edge
181+
cases where proper parsing can be difficult.</p>
182+
183+
<h4 id="fencedcodeblocks">Fenced Code Blocks </h4>
184+
185+
<p>Fenced code blocks are fundamentally the same as <abbr title="MultiMarkdown">MMD</abbr> v5, except:</p>
186+
187+
<ol>
188+
<li><p>The leading and trailing fences can be 3, 4, or 5 backticks in length. That
189+
should be sufficient to account for complex documents without requiring a more
190+
complex parser.</p></li>
191+
<li><p>If there is no trailing fence, then everything after the leading fence is
192+
considered to be part of the code block.</p></li>
193+
</ol>
194+
195+
<h4 id="glossaryterms">Glossary Terms </h4>
196+
197+
<p>If there are terms in your document you wish to define in a <a href="#gn:3" id="gnref:3" title="see glossary" class="glossary">glossary</a> at
198+
the end of your document, you can define them using the glossary syntax.</p>
199+
200+
<p>Glossary terms can be specified using inline or reference syntax. The inline
201+
variant requires that the abbreviation be wrapped in parentheses and
202+
immediately follows the <code>?</code>.</p>
203+
204+
<pre><code>[?(glossary) The glossary collects information about important
205+
terms used in your document] is a glossary term.
206+
207+
[?glossary] is also a glossary term.
208+
209+
[?glossary]: The glossary collects information about important
210+
terms used in your document
211+
</code></pre>
212+
213+
<h4 id="internationalization">Internationalization </h4>
214+
215+
<p><abbr title="MultiMarkdown">MMD</abbr> v6 includes support for substituting certain text phrases in other
216+
languages. This only affects the HTML format.</p>
217+
218+
<h4 id="metadata">Metadata </h4>
219+
220+
<p>Metadata in <abbr title="MultiMarkdown">MMD</abbr> v6 includes new support for LaTeX &#8211; the <code>latex config</code> key
221+
allows you to automatically setup of multiple <code>latex include</code> files at once.
222+
The default setups that I use would typically consist of one LaTeX file to be
223+
included at the top of the file, one to be included right at the beginning of
224+
the document, and one to be included at the end of the document. If you want
225+
to specify the latex files separately, you can use <code>latex leader</code>, <code>latex
226+
begin</code>, and <code>latex footer</code>.</p>
227+
228+
<h4 id="tableofcontents">Table of Contents </h4>
229+
230+
<p>By placing <code>{{TOC}}</code> in your document, you can insert an automatically
231+
generated Table of Contents in your document. As of <abbr title="MultiMarkdown">MMD</abbr> v6, the native
232+
Table of Contents functionality is used when exporting to LaTeX or
233+
OpenDocument formats.</p>
234+
235+
<h3 id="futuresteps">Future Steps </h3>
236+
237+
<p>Some features I plan to implement at some point:</p>
238+
239+
<ol>
240+
<li><p><abbr title="MultiMarkdown">MMD</abbr> v5 used to automatically identify abbreviated terms throughout the
241+
document and substitute them automatically. I plan to reimplement this
242+
functionality, but will probably improve upon it to include glossary terms,
243+
and possibly even support for indexing documents in LaTeX (and possibly
244+
OpenOffice).</p></li>
245+
<li><p>OPML export support is not available in v6. I plan on adding improved
246+
support for this at some point. I was hoping to be able to re-use the
247+
existing v6 parser but it might be simpler to use the approach from v5 and
248+
earlier, which was to have a separate parser tuned to only identify headers
249+
and &#8220;stuff between headers&#8221;.</p></li>
250+
<li><p>Improved EPUB support. Currently, EPUB support is provided by a separate
251+
<a href="https://github.com/fletcher/MMD-ePub">tool</a>. At some point, I would like to
252+
better integrate this into <abbr title="MultiMarkdown">MMD</abbr> itself.</p></li>
253+
</ol>
254+
255+
<div class="glossary">
256+
<hr />
257+
<ol>
258+
259+
<li id="gn:1">
260+
PEG: <p>Parsing Expression Grammar <a href="https://en.wikipedia.org/wiki/Parsing_expression_grammar">https://en.wikipedia.org/wiki/Parsing_expression_grammar</a> <a href="#gnref:1" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
261+
</li>
262+
263+
<li id="gn:2">
264+
AST: <p>Abstract Syntax Tree <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">https://en.wikipedia.org/wiki/Abstract_syntax_tree</a> <a href="#gnref:2" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
265+
</li>
266+
267+
<li id="gn:3">
268+
glossary: <p>The
269+
glossary collects information about important terms used in your document <a href="#gnref:3" title="return to body" class="reverseglossary">&#160;&#8617;</a></p>
270+
</li>
271+
272+
</ol>
273+
</div>
274+
275+
</body>
276+
</html>
277+

QuickStart.pdf

82.5 KB
Binary file not shown.

0 commit comments

Comments
 (0)