You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<pclass="copyright" data-fill-with="copyright"><ahref="http://creativecommons.org/publicdomain/zero/1.0/" rel="license"><imgalt="CC0" src="https://licensebuttons.net/p/zero/1.0/80x15.png"></a> To the extent possible under law, the editors have waived all copyright
1442
1442
and related or neighboring rights to this work.
1443
-
In addition, as of 18 October 2016,
1443
+
In addition, as of 19 October 2016,
1444
1444
the editors have made this specification available under the <ahref="http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0" rel="license">Open Web Foundation Agreement Version 1.0</a>,
1445
1445
which is available at http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
1446
1446
Parts of this work may be from another specification document. If so, those parts are instead covered by the license of that specification document. </p>
<p>If no other layout element applies, the <code><adata-link-type="element" href="#elementdef-ocr_cinfo" id="ref-for-elementdef-ocr_cinfo-1">ocr_cinfo</a></code> element may be used.</p>
<p>If no other layout element applies, the <code><adata-link-type="element" href="#elementdef-ocr_cinfo" id="ref-for-elementdef-ocr_cinfo-1">ocr_cinfo</a></code> element may be used.</p>
2153
+
<lidata-md="">
2154
+
<p><code><adata-link-type="element">ocrx_cinfo</a></code> should nest inside <code><adata-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-1">ocrx_line</a></code></p>
2155
+
<lidata-md="">
2156
+
<p><code><adata-link-type="element">ocrx_cinfo</a></code> should contain only <aclass="property" data-link-type="propdesc" href="#propdef-x_confs" id="ref-for-propdef-x_confs-1">x_confs</a>, <aclass="property" data-link-type="propdesc" href="#propdef-x_bboxes" id="ref-for-propdef-x_bboxes-2">x_bboxes</a>, and <aclass="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-1">cuts</a> attributes</p>
2157
+
</ul>
2150
2158
<h3class="heading settled" data-level="7.2" id="properties-for-character-information"><spanclass="secno">7.2. </span><spanclass="content">Properties for Character Information</span><aclass="self-link" href="#properties-for-character-information"></a></h3>
<p>there must be a <aclass="property" data-link-type="propdesc" href="#propdef-bbox" id="ref-for-propdef-bbox-4">bbox</a> property relative to which the <aclass="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-1">cuts</a> can be interpreted</p>
2165
+
<p>there must be a <aclass="property" data-link-type="propdesc" href="#propdef-bbox" id="ref-for-propdef-bbox-4">bbox</a> property relative to which the <aclass="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-2">cuts</a> can be interpreted</p>
<p>engine-specific because the definition of a "block" depends on the engine</p>
2202
2210
</ul>
2211
+
<p>Generators should attempt to ensure the following properties:</p>
2212
+
<ul>
2213
+
<lidata-md="">
2214
+
<p>An <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-2">ocrx_block</a></code> should not contain content from multiple <code><adata-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-11">ocr_carea</a></code>.</p>
2215
+
<lidata-md="">
2216
+
<p>The union of all <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-3">ocrx_blocks</a></code> should approximately cover all <code><adata-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-12">ocr_carea</a></code>.</p>
2217
+
<lidata-md="">
2218
+
<p>an <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-4">ocrx_block</a></code> should contain either a float or body text, but not both</p>
2219
+
<lidata-md="">
2220
+
<p>an <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-5">ocrx_block</a></code> should contain either an image or text, but not both</p>
<pclass="issue" id="issue-8ef34561"><aclass="self-link" href="#issue-8ef34561"></a><ahref="https://github.com/kba/hocr-spec/issues/19">ocr_line vs ocrx_line</a></p>
2205
2224
<ul>
2206
2225
<lidata-md="">
2207
2226
<p>any kind of "line" returned by an OCR system that differs from the standard <code><adata-link-type="element" href="#elementdef-ocr_line" id="ref-for-elementdef-ocr_line-6">ocr_line</a></code> above</p>
2208
2227
<lidata-md="">
2209
2228
<p>might be some kind of "logical" line</p>
2229
+
<lidata-md="">
2230
+
<p>an <code><adata-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-2">ocrx_line</a></code> should correspond as closely as possible to an <code><adata-link-type="element" href="#elementdef-ocr_line" id="ref-for-elementdef-ocr_line-7">ocr_line</a></code></p>
<p>engine specific because the definition of a "word" depends on the engine</p>
2217
2238
</ul>
2218
-
<p>The meaning of these tags is OCR engine specific. However, generators should
2219
-
attempt to ensure the following properties:</p>
2220
-
<ul>
2221
-
<lidata-md="">
2222
-
<p>An <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-2">ocrx_block</a></code> should not contain content from multiple <code><adata-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-11">ocr_carea</a></code>.</p>
2223
-
<lidata-md="">
2224
-
<p>The union of all <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-3">ocrx_blocks</a></code> should approximately cover all <code><adata-link-type="element" href="#elementdef-ocr_carea" id="ref-for-elementdef-ocr_carea-12">ocr_carea</a></code>.</p>
2225
-
<lidata-md="">
2226
-
<p>an <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-4">ocrx_block</a></code> should contain either a float or body text, but not both</p>
2227
-
<lidata-md="">
2228
-
<p>an <code><adata-link-type="element" href="#elementdef-ocrx_block" id="ref-for-elementdef-ocrx_block-5">ocrx_block</a></code> should contain either an image or text, but not both</p>
2229
-
<lidata-md="">
2230
-
<p>an <code><adata-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-1">ocrx_line</a></code> should correspond as closely as possible to an <code><adata-link-type="element" href="#elementdef-ocr_line" id="ref-for-elementdef-ocr_line-7">ocr_line</a></code></p>
2231
-
<lidata-md="">
2232
-
<p><code><adata-link-type="element">ocrx_cinfo</a></code> should nest inside <code><adata-link-type="element" href="#elementdef-ocrx_line" id="ref-for-elementdef-ocrx_line-2">ocrx_line</a></code></p>
2233
-
<lidata-md="">
2234
-
<p><code><adata-link-type="element">ocrx_cinfo</a></code> should contain only <aclass="property" data-link-type="propdesc" href="#propdef-x_confs" id="ref-for-propdef-x_confs-1">x_confs</a>, <aclass="property" data-link-type="propdesc" href="#propdef-x_bboxes" id="ref-for-propdef-x_bboxes-2">x_bboxes</a>, and <aclass="property" data-link-type="propdesc" href="#propdef-cuts" id="ref-for-propdef-cuts-2">cuts</a> attributes</p>
0 commit comments