You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This command will write out a json file with the detected layout.
202
+
This command will write out a json file with the detected layout and reading order.
203
203
204
204
```shell
205
205
surya_layout DATA_PATH
@@ -215,14 +215,14 @@ The `results.json` file will contain a json dictionary where the keys are the in
215
215
-`bboxes` - detected bounding boxes for text
216
216
-`bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
217
217
-`polygon` - the polygon for the text line in (x1, y1), (x2, y2), (x3, y3), (x4, y4) format. The points are in clockwise order from the top left.
218
-
-`confidence` - the confidence of the model in the detected text (0-1). This is currently not very reliable.
219
-
-`label` - the label for the bbox. One of `Caption`, `Footnote`, `Formula`, `List-item`, `Page-footer`, `Page-header`, `Picture`, `Figure`, `Section-header`, `Table`, `Text`, `Title`.
218
+
-`position` - the reading order of the box.
219
+
-`label` - the label for the bbox. One of `Caption`, `Footnote`, `Formula`, `List-item`, `Page-footer`, `Page-header`, `Picture`, `Figure`, `Section-header`, `Table`, `Form`, `Table-of-contents`, `Handwriting`, `Text`, `Text-inline-math`.
220
220
-`page` - the page number in the file
221
221
-`image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
222
222
223
223
**Performance tips**
224
224
225
-
Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `400MB` of VRAM, so very high batch sizes are possible. The default is a batch size `36`, which will use about 16GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `6`.
225
+
Setting the `LAYOUT_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `220MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 7GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `4`.
226
226
227
227
### From python
228
228
@@ -231,7 +231,6 @@ from PIL import Image
231
231
from surya.detection import batch_text_detection
232
232
from surya.layout import batch_layout_detection
233
233
from surya.model.layout.model import load_model, load_processor
This command will write out a json file with the detected reading order and layout.
250
-
251
-
```shell
252
-
surya_order DATA_PATH
253
-
```
254
-
255
-
-`DATA_PATH` can be an image, pdf, or folder of images/pdfs
256
-
-`--images` will save images of the pages and detected text lines (optional)
257
-
-`--max` specifies the maximum number of pages to process if you don't want to process everything
258
-
-`--results_dir` specifies the directory to save results to instead of the default
259
-
260
-
The `results.json` file will contain a json dictionary where the keys are the input filenames without extensions. Each value will be a list of dictionaries, one per page of the input document. Each page dictionary contains:
261
-
262
-
-`bboxes` - detected bounding boxes for text
263
-
-`bbox` - the axis-aligned rectangle for the text line in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner.
264
-
-`position` - the position in the reading order of the bbox, starting from 0.
265
-
-`label` - the label for the bbox. See the layout section of the documentation for a list of potential labels.
266
-
-`page` - the page number in the file
267
-
-`image_bbox` - the bbox for the image in (x1, y1, x2, y2) format. (x1, y1) is the top left corner, and (x2, y2) is the bottom right corner. All line bboxes will be contained within this bbox.
268
-
269
-
**Performance tips**
270
-
271
-
Setting the `ORDER_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `360MB` of VRAM, so very high batch sizes are possible. The default is a batch size `32`, which will use about 11GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `4`.
272
-
273
-
### From python
274
-
275
-
```python
276
-
fromPILimport Image
277
-
from surya.ordering import batch_ordering
278
-
from surya.model.ordering.processor import load_processor
279
-
from surya.model.ordering.model import load_model
280
-
281
-
image = Image.open(IMAGE_PATH)
282
-
# bboxes should be a list of lists with layout bboxes for the image in [x1,y1,x2,y2] format
283
-
# You can get this from the layout model, see above for usage
284
-
bboxes = [bbox1, bbox2, ...]
285
-
286
-
model = load_model()
287
-
processor = load_processor()
288
-
289
-
# order_predictions will be a list of dicts, one per image
This command will write out a json file with the detected table cells and row/column ids, along with row/column bounding boxes. If you want to get a formatted markdown table, check out the [tabled](https://www.github.com/VikParuchuri/tabled) repo.
@@ -324,6 +277,9 @@ The `results.json` file will contain a json dictionary where the keys are the in
324
277
325
278
Setting the `TABLE_REC_BATCH_SIZE` env var properly will make a big difference when using a GPU. Each batch item will use `150MB` of VRAM, so very high batch sizes are possible. The default is a batch size `64`, which will use about 10GB of VRAM. Depending on your CPU core count, it might help, too - the default CPU batch size is `8`.
326
279
280
+
### From python
281
+
282
+
See `table_recognition.py` for a code sample. Table recognition depends on extracting cells, so it is a little more involved to setup than other model types.
327
283
328
284
# Limitations
329
285
@@ -410,16 +366,15 @@ Then we calculate precision and recall for the whole dataset.
@@ -430,7 +385,7 @@ I benchmarked the layout analysis on [Publaynet](https://github.com/ibm-aur-nlp/
430
385
431
386
## Reading Order
432
387
433
-
75% mean accuracy, and .14 seconds per image on an A6000 GPU. See methodology for notes - this benchmark is not perfect measure of accuracy, and is more useful as a sanity check.
388
+
88% mean accuracy, and .4 seconds per image on an A10 GPU. See methodology for notes - this benchmark is not perfect measure of accuracy, and is more useful as a sanity check.
use_pdf_boxes=st.sidebar.checkbox("PDF table boxes", value=True, help="Table recognition only: Use the bounding boxes from the PDF file vs text detection model.")
217
198
skip_table_detection=st.sidebar.checkbox("Skip table detection", value=False, help="Table recognition only: Skip table detection and treat the whole image/page as a table.")
0 commit comments