Skip to content

Commit 6fdbbbf

Browse files
committed
Move paragraphs from the 'ocrx_word' section to the right sections
1 parent d8caa3c commit 6fdbbbf

File tree

3 files changed

+61
-61
lines changed

3 files changed

+61
-61
lines changed

1.2/index.bs

+13-14
Original file line numberDiff line numberDiff line change
@@ -609,7 +609,11 @@ single "line" of text.
609609

610610
### <dfn element>ocr_cinfo</dfn>
611611

612-
If no other layout element applies, the <{ocr_cinfo}> element may be used.
612+
Issue: ocrx_cinfo?
613+
614+
* If no other layout element applies, the <{ocr_cinfo}> element may be used.
615+
* <{ocrx_cinfo}> should nest inside <{ocrx_line}>
616+
* <{ocrx_cinfo}> should contain only 'x_confs', 'x_bboxes', and 'cuts' attributes
613617

614618
## Properties for Character Information
615619

@@ -678,31 +682,26 @@ Issue: [ocr_carea vs ocrx_block](https://github.com/kba/hocr-spec/issues/28)
678682
* any kind of "block" returned by an OCR system
679683
* engine-specific because the definition of a "block" depends on the engine
680684

685+
Generators should attempt to ensure the following properties:
686+
687+
* An <{ocrx_block}> should not contain content from multiple <{ocr_carea}>.
688+
* The union of all <{ocrx_block|ocrx_blocks}> should approximately cover all <{ocr_carea}>.
689+
* an <{ocrx_block}> should contain either a float or body text, but not both
690+
* an <{ocrx_block}> should contain either an image or text, but not both
691+
681692
### <dfn element>ocrx_line</dfn>
682693

683694
Issue: [ocr_line vs ocrx_line](https://github.com/kba/hocr-spec/issues/19)
684695

685696
* any kind of "line" returned by an OCR system that differs from the standard <{ocr_line}> above
686697
* might be some kind of "logical" line
698+
* an <{ocrx_line}> should correspond as closely as possible to an <{ocr_line}>
687699

688700
### <dfn element>ocrx_word</dfn>
689701

690702
* any kind of "word" returned by an OCR system
691703
* engine specific because the definition of a "word" depends on the engine
692704

693-
The meaning of these tags is OCR engine specific. However, generators should
694-
attempt to ensure the following properties:
695-
696-
* An <{ocrx_block}> should not contain content from multiple <{ocr_carea}>.
697-
* The union of all <{ocrx_block|ocrx_blocks}> should approximately cover all <{ocr_carea}>.
698-
* an <{ocrx_block}> should contain either a float or body text, but not both
699-
* an <{ocrx_block}> should contain either an image or text, but not both
700-
* an <{ocrx_line}> should correspond as closely as possible to an <{ocr_line}>
701-
* <{ocrx_cinfo}> should nest inside <{ocrx_line}>
702-
* <{ocrx_cinfo}> should contain only 'x_confs', 'x_bboxes', and 'cuts' attributes
703-
704-
Issue: ocrx_cinfo?
705-
706705
## Properties for engine-specific markup
707706

708707
The following properties are defined:

1.2/index.html

+35-33
Original file line numberDiff line numberDiff line change
@@ -1421,7 +1421,7 @@
14211421
<div class="head">
14221422
<p data-fill-with="logo"></p>
14231423
<h1 class="p-name no-ref" id="title">hOCR - OCR Workflow and Output embedded in HTML</h1>
1424-
<h2 class="no-num no-toc no-ref heading settled" id="subtitle"><span class="content">Living Standard, <time class="dt-updated" datetime="2016-10-18">18 October 2016</time></span></h2>
1424+
<h2 class="no-num no-toc no-ref heading settled" id="subtitle"><span class="content">Living Standard, <time class="dt-updated" datetime="2016-10-19">19 October 2016</time></span></h2>
14251425
<div data-fill-with="spec-metadata">
14261426
<dl>
14271427
<dt>This version:
@@ -1440,7 +1440,7 @@ <h2 class="no-num no-toc no-ref heading settled" id="subtitle"><span class="cont
14401440
<div data-fill-with="warning"></div>
14411441
<p class="copyright" data-fill-with="copyright"><a href="http://creativecommons.org/publicdomain/zero/1.0/" rel="license"><img alt="CC0" src="https://licensebuttons.net/p/zero/1.0/80x15.png"></a> To the extent possible under law, the editors have waived all copyright
14421442
and related or neighboring rights to this work.
1443-
In addition, as of 18 October 2016,
1443+
In addition, as of 19 October 2016,
14441444
the editors have made this specification available under the <a href="http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0" rel="license">Open Web Foundation Agreement Version 1.0</a>,
14451445
which is available at http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
14461446
Parts of this work may be from another specification document. If so, those parts are instead covered by the license of that specification document. </p>
@@ -2146,15 +2146,23 @@ <h3 class="heading settled" data-level="7.1" id="classes-for-character-informati
21462146
<p>Character-level information may be put on any element that contains only a
21472147
single "line" of text.</p>
21482148
<h4 class="heading settled" data-level="7.1.1" id="ocr_cinfo"><span class="secno">7.1.1. </span><span class="content"><dfn class="dfn-paneled" data-dfn-type="element" data-export="" id="elementdef-ocr_cinfo">ocr_cinfo</dfn></span><a class="self-link" href="#ocr_cinfo"></a></h4>
2149-
<p>If no other layout element applies, the <code><a data-link-type="element" href="#elementdef-ocr_cinfo" id="ref-for-elementdef-ocr_cinfo-1">ocr_cinfo</a></code> element may be used.</p>
2149+
<p class="issue" id="issue-000a0ed5"><a class="self-link" href="#issue-000a0ed5"></a> ocrx_cinfo?</p>
2150+
<ul>
2151+
<li data-md="">
2152+
<p>If no other layout element applies, the <code><a data-link-type="element" href="#elementdef-ocr_cinfo" id="ref-for-elementdef-ocr_cinfo-1">ocr_cinfo</a></code> element may be used.</p>
2153+
<li data-md="">
2154+
<p><code><a data-link-type="element">ocrx_cinfo</a></code> should nest inside <code><a data-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-1">ocrx_line</a></code></p>
2155+
<li data-md="">
2156+
<p><code><a data-link-type="element">ocrx_cinfo</a></code> should contain only <a class="property" data-link-type="propdesc" href="#propdef-x_confs" id="ref-for-propdef-x_confs-1">x_confs</a>, <a class="property" data-link-type="propdesc" href="#propdef-x_bboxes" id="ref-for-propdef-x_bboxes-2">x_bboxes</a>, and <a class="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-1">cuts</a> attributes</p>
2157+
</ul>
21502158
<h3 class="heading settled" data-level="7.2" id="properties-for-character-information"><span class="secno">7.2. </span><span class="content">Properties for Character Information</span><a class="self-link" href="#properties-for-character-information"></a></h3>
21512159
<h4 class="heading settled" data-level="7.2.1" id="cuts"><span class="secno">7.2.1. </span><span class="content"><dfn class="dfn-paneled css" data-dfn-type="property" data-export="" id="propdef-cuts">cuts</dfn></span><a class="self-link" href="#cuts"></a></h4>
21522160
<p><code>cuts c1 c2 c3 ...</code></p>
21532161
<ul>
21542162
<li data-md="">
21552163
<p>character segmentation cuts (see below)</p>
21562164
<li data-md="">
2157-
<p>there must be a <a class="property" data-link-type="propdesc" href="#propdef-bbox" id="ref-for-propdef-bbox-4">bbox</a> property relative to which the <a class="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-1">cuts</a> can be interpreted</p>
2165+
<p>there must be a <a class="property" data-link-type="propdesc" href="#propdef-bbox" id="ref-for-propdef-bbox-4">bbox</a> property relative to which the <a class="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-2">cuts</a> can be interpreted</p>
21582166
</ul>
21592167
<h4 class="heading settled" data-level="7.2.2" id="nlp"><span class="secno">7.2.2. </span><span class="content"><dfn class="dfn-paneled css" data-dfn-type="property" data-export="" id="propdef-nlp">nlp</dfn></span><a class="self-link" href="#nlp"></a></h4>
21602168
<p><code>nlp c1 c2 c3 ...</code></p>
@@ -2200,13 +2208,26 @@ <h4 class="heading settled" data-level="8.1.1" id="ocrx_block"><span class="secn
22002208
<li data-md="">
22012209
<p>engine-specific because the definition of a "block" depends on the engine</p>
22022210
</ul>
2211+
<p>Generators should attempt to ensure the following properties:</p>
2212+
<ul>
2213+
<li data-md="">
2214+
<p>An <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-2">ocrx_block</a></code> should not contain content from multiple <code><a data-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-11">ocr_carea</a></code>.</p>
2215+
<li data-md="">
2216+
<p>The union of all <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-3">ocrx_blocks</a></code> should approximately cover all <code><a data-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-12">ocr_carea</a></code>.</p>
2217+
<li data-md="">
2218+
<p>an <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-4">ocrx_block</a></code> should contain either a float or body text, but not both</p>
2219+
<li data-md="">
2220+
<p>an <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-5">ocrx_block</a></code> should contain either an image or text, but not both</p>
2221+
</ul>
22032222
<h4 class="heading settled" data-level="8.1.2" id="ocrx_line"><span class="secno">8.1.2. </span><span class="content"><dfn class="dfn-paneled" data-dfn-type="element" data-export="" id="elementdef-ocrx_line">ocrx_line</dfn></span><a class="self-link" href="#ocrx_line"></a></h4>
22042223
<p class="issue" id="issue-8ef34561"><a class="self-link" href="#issue-8ef34561"></a> <a href="https://github.com/kba/hocr-spec/issues/19">ocr_line vs ocrx_line</a></p>
22052224
<ul>
22062225
<li data-md="">
22072226
<p>any kind of "line" returned by an OCR system that differs from the standard <code><a data-link-type="element" href="#elementdef-ocr_line" id="ref-for-elementdef-ocr_line-6">ocr_line</a></code> above</p>
22082227
<li data-md="">
22092228
<p>might be some kind of "logical" line</p>
2229+
<li data-md="">
2230+
<p>an <code><a data-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-2">ocrx_line</a></code> should correspond as closely as possible to an <code><a data-link-type="element" href="#elementdef-ocr_line" id="ref-for-elementdef-ocr_line-7">ocr_line</a></code></p>
22102231
</ul>
22112232
<h4 class="heading settled" data-level="8.1.3" id="ocrx_word"><span class="secno">8.1.3. </span><span class="content"><dfn class="dfn-paneled" data-dfn-type="element" data-export="" id="elementdef-ocrx_word">ocrx_word</dfn></span><a class="self-link" href="#ocrx_word"></a></h4>
22122233
<ul>
@@ -2215,25 +2236,6 @@ <h4 class="heading settled" data-level="8.1.3" id="ocrx_word"><span class="secno
22152236
<li data-md="">
22162237
<p>engine specific because the definition of a "word" depends on the engine</p>
22172238
</ul>
2218-
<p>The meaning of these tags is OCR engine specific. However, generators should
2219-
attempt to ensure the following properties:</p>
2220-
<ul>
2221-
<li data-md="">
2222-
<p>An <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-2">ocrx_block</a></code> should not contain content from multiple <code><a data-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-11">ocr_carea</a></code>.</p>
2223-
<li data-md="">
2224-
<p>The union of all <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-3">ocrx_blocks</a></code> should approximately cover all <code><a data-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-12">ocr_carea</a></code>.</p>
2225-
<li data-md="">
2226-
<p>an <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-4">ocrx_block</a></code> should contain either a float or body text, but not both</p>
2227-
<li data-md="">
2228-
<p>an <code><a data-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-5">ocrx_block</a></code> should contain either an image or text, but not both</p>
2229-
<li data-md="">
2230-
<p>an <code><a data-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-1">ocrx_line</a></code> should correspond as closely as possible to an <code><a data-link-type="element" href="#elementdef-ocr_line" id="ref-for-elementdef-ocr_line-7">ocr_line</a></code></p>
2231-
<li data-md="">
2232-
<p><code><a data-link-type="element">ocrx_cinfo</a></code> should nest inside <code><a data-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-2">ocrx_line</a></code></p>
2233-
<li data-md="">
2234-
<p><code><a data-link-type="element">ocrx_cinfo</a></code> should contain only <a class="property" data-link-type="propdesc" href="#propdef-x_confs" id="ref-for-propdef-x_confs-1">x_confs</a>, <a class="property" data-link-type="propdesc" href="#propdef-x_bboxes" id="ref-for-propdef-x_bboxes-2">x_bboxes</a>, and <a class="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-2">cuts</a> attributes</p>
2235-
</ul>
2236-
<p class="issue" id="issue-000a0ed5"><a class="self-link" href="#issue-000a0ed5"></a> ocrx_cinfo?</p>
22372239
<h3 class="heading settled" data-level="8.2" id="properties-for-engine-specific-markup"><span class="secno">8.2. </span><span class="content">Properties for engine-specific markup</span><a class="self-link" href="#properties-for-engine-specific-markup"></a></h3>
22382240
<p>The following properties are defined:</p>
22392241
<h4 class="heading settled" data-level="8.2.1" id="x_font"><span class="secno">8.2.1. </span><span class="content"><dfn class="css" data-dfn-type="property" data-export="" id="propdef-x_font">x_font<a class="self-link" href="#propdef-x_font"></a></dfn></span><a class="self-link" href="#x_font"></a></h4>
@@ -2985,9 +2987,9 @@ <h2 class="no-num no-ref heading settled" id="issues-index"><span class="content
29852987
<div class="issue"> <a href="https://github.com/kba/hocr-spec/issues/10">Use of property presence</a><a href="#issue-4c7527e8"></a></div>
29862988
<div class="issue"> There is currently no way of indicating anchoring or flow-around
29872989
properties for floating elements; properties need to be defined for this.<a href="#issue-3f2f70ed"></a></div>
2990+
<div class="issue"> ocrx_cinfo?<a href="#issue-000a0ed5"></a></div>
29882991
<div class="issue"> <a href="https://github.com/kba/hocr-spec/issues/28">ocr_carea vs ocrx_block</a><a href="#issue-66c198d9"></a></div>
29892992
<div class="issue"> <a href="https://github.com/kba/hocr-spec/issues/19">ocr_line vs ocrx_line</a><a href="#issue-8ef34561"></a></div>
2990-
<div class="issue"> ocrx_cinfo?<a href="#issue-000a0ed5"></a></div>
29912993
<div class="issue"> <a href="https://github.com/kba/hocr-spec/issues/9">Delete x_cost</a><a href="#issue-b35297dd"></a></div>
29922994
<div class="issue"> <a href="https://github.com/kba/hocr-spec/issues/2">XML namespace for hOCR HTML?</a><a href="#issue-f6d39356"></a></div>
29932995
<div class="issue"> <a href="https://github.com/kba/hocr-spec/issues/1">What DOCTYPE for hOCR HTML?</a><a href="#issue-a3899b99"></a></div>
@@ -3101,7 +3103,7 @@ <h2 class="no-num no-ref heading settled" id="issues-index"><span class="content
31013103
<li><a href="#ref-for-elementdef-ocr_carea-1">3.2.4. cflow</a> <a href="#ref-for-elementdef-ocr_carea-2">(2)</a> <a href="#ref-for-elementdef-ocr_carea-3">(3)</a>
31023104
<li><a href="#ref-for-elementdef-ocr_carea-4">5.1.2. ocr_column</a>
31033105
<li><a href="#ref-for-elementdef-ocr_carea-5">5.1.3. ocr_carea</a> <a href="#ref-for-elementdef-ocr_carea-6">(2)</a> <a href="#ref-for-elementdef-ocr_carea-7">(3)</a> <a href="#ref-for-elementdef-ocr_carea-8">(4)</a> <a href="#ref-for-elementdef-ocr_carea-9">(5)</a> <a href="#ref-for-elementdef-ocr_carea-10">(6)</a>
3104-
<li><a href="#ref-for-elementdef-ocr_carea-11">8.1.3. ocrx_word</a> <a href="#ref-for-elementdef-ocr_carea-12">(2)</a>
3106+
<li><a href="#ref-for-elementdef-ocr_carea-11">8.1.1. ocrx_block</a> <a href="#ref-for-elementdef-ocr_carea-12">(2)</a>
31053107
</ul>
31063108
</aside>
31073109
<aside class="dfn-panel" data-for="elementdef-ocr_line">
@@ -3111,8 +3113,7 @@ <h2 class="no-num no-ref heading settled" id="issues-index"><span class="content
31113113
<li><a href="#ref-for-elementdef-ocr_line-2">5.1.4. ocr_line</a> <a href="#ref-for-elementdef-ocr_line-3">(2)</a>
31123114
<li><a href="#ref-for-elementdef-ocr_line-4">5.3.3. x_source</a>
31133115
<li><a href="#ref-for-elementdef-ocr_line-5">5.3.4. hardbreak</a>
3114-
<li><a href="#ref-for-elementdef-ocr_line-6">8.1.2. ocrx_line</a>
3115-
<li><a href="#ref-for-elementdef-ocr_line-7">8.1.3. ocrx_word</a>
3116+
<li><a href="#ref-for-elementdef-ocr_line-6">8.1.2. ocrx_line</a> <a href="#ref-for-elementdef-ocr_line-7">(2)</a>
31163117
</ul>
31173118
</aside>
31183119
<aside class="dfn-panel" data-for="elementdef-ocr_float">
@@ -3130,8 +3131,8 @@ <h2 class="no-num no-ref heading settled" id="issues-index"><span class="content
31303131
<aside class="dfn-panel" data-for="propdef-cuts">
31313132
<b><a href="#propdef-cuts">#propdef-cuts</a></b><b>Referenced in:</b>
31323133
<ul>
3133-
<li><a href="#ref-for-propdef-cuts-1">7.2.1. cuts</a>
3134-
<li><a href="#ref-for-propdef-cuts-2">8.1.3. ocrx_word</a>
3134+
<li><a href="#ref-for-propdef-cuts-1">7.1.1. ocr_cinfo</a>
3135+
<li><a href="#ref-for-propdef-cuts-2">7.2.1. cuts</a>
31353136
</ul>
31363137
</aside>
31373138
<aside class="dfn-panel" data-for="propdef-nlp">
@@ -3144,14 +3145,15 @@ <h2 class="no-num no-ref heading settled" id="issues-index"><span class="content
31443145
<b><a href="#elementdef-ocrx_block">#elementdef-ocrx_block</a></b><b>Referenced in:</b>
31453146
<ul>
31463147
<li><a href="#ref-for-elementdef-ocrx_block-1">3.2.4. cflow</a>
3147-
<li><a href="#ref-for-elementdef-ocrx_block-2">8.1.3. ocrx_word</a> <a href="#ref-for-elementdef-ocrx_block-3">(2)</a> <a href="#ref-for-elementdef-ocrx_block-4">(3)</a> <a href="#ref-for-elementdef-ocrx_block-5">(4)</a>
3148+
<li><a href="#ref-for-elementdef-ocrx_block-2">8.1.1. ocrx_block</a> <a href="#ref-for-elementdef-ocrx_block-3">(2)</a> <a href="#ref-for-elementdef-ocrx_block-4">(3)</a> <a href="#ref-for-elementdef-ocrx_block-5">(4)</a>
31483149
<li><a href="#ref-for-elementdef-ocrx_block-6">14. Profiles</a>
31493150
</ul>
31503151
</aside>
31513152
<aside class="dfn-panel" data-for="elementdef-ocrx_line">
31523153
<b><a href="#elementdef-ocrx_line">#elementdef-ocrx_line</a></b><b>Referenced in:</b>
31533154
<ul>
3154-
<li><a href="#ref-for-elementdef-ocrx_line-1">8.1.3. ocrx_word</a> <a href="#ref-for-elementdef-ocrx_line-2">(2)</a>
3155+
<li><a href="#ref-for-elementdef-ocrx_line-1">7.1.1. ocr_cinfo</a>
3156+
<li><a href="#ref-for-elementdef-ocrx_line-2">8.1.2. ocrx_line</a>
31553157
<li><a href="#ref-for-elementdef-ocrx_line-3">14. Profiles</a>
31563158
</ul>
31573159
</aside>
@@ -3165,13 +3167,13 @@ <h2 class="no-num no-ref heading settled" id="issues-index"><span class="content
31653167
<b><a href="#propdef-x_bboxes">#propdef-x_bboxes</a></b><b>Referenced in:</b>
31663168
<ul>
31673169
<li><a href="#ref-for-propdef-x_bboxes-1">3.1.1. bbox</a>
3168-
<li><a href="#ref-for-propdef-x_bboxes-2">8.1.3. ocrx_word</a>
3170+
<li><a href="#ref-for-propdef-x_bboxes-2">7.1.1. ocr_cinfo</a>
31693171
</ul>
31703172
</aside>
31713173
<aside class="dfn-panel" data-for="propdef-x_confs">
31723174
<b><a href="#propdef-x_confs">#propdef-x_confs</a></b><b>Referenced in:</b>
31733175
<ul>
3174-
<li><a href="#ref-for-propdef-x_confs-1">8.1.3. ocrx_word</a>
3176+
<li><a href="#ref-for-propdef-x_confs-1">7.1.1. ocr_cinfo</a>
31753177
</ul>
31763178
</aside>
31773179
<aside class="dfn-panel" data-for="valdef-ocr-capabilities-ocrp_lang">

0 commit comments

Comments
 (0)