Skip to content

Commit 91c9ad9

Browse files
committed
Minor tweaks
1 parent 82000f6 commit 91c9ad9

File tree

3 files changed

+16
-6
lines changed

3 files changed

+16
-6
lines changed

.github/workflows/scripts.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,16 @@ jobs:
2525
- name: Test detection
2626
run: poetry run surya_detect benchmark_data/pdfs/switch_trans.pdf --page_range 0
2727
- name: Test OCR
28+
env:
29+
RECOGNITION_MAX_TOKENS: 25
2830
run: poetry run surya_ocr benchmark_data/pdfs/switch_trans.pdf --page_range 0
2931
- name: Test layout
3032
run: poetry run surya_layout benchmark_data/pdfs/switch_trans.pdf --page_range 0
3133
- name: Test table
3234
run: poetry run surya_table benchmark_data/pdfs/switch_trans.pdf --page_range 0
33-
- name: Test detection folder
34-
run: poetry run surya_detect benchmark_data/pdfs --page_range 0
3535
- name: Test texify
3636
env:
3737
TEXIFY_MAX_TOKENS: 25
38-
run: poetry run surya_latex_ocr benchmark_data/pdfs --page_range 0
38+
run: poetry run surya_latex_ocr benchmark_data/pdfs/switch_trans.pdf --page_range 0
39+
- name: Test detection folder
40+
run: poetry run surya_detect benchmark_data/pdfs --page_range 0

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,14 @@ Higher is better for intersection, which the percentage of the actual row/column
452452

453453
The benchmark uses a subset of [Fintabnet](https://developer.ibm.com/exchanges/data/all/fintabnet/) from IBM. It has labeled rows and columns. After table recognition is run, the predicted rows and columns are compared to the ground truth. There is an additional penalty for predicting too many or too few rows/columns.
454454

455+
## LaTeX OCR
456+
457+
| Method | edit ⬇ | time taken (s) ⬇ |
458+
|--------|----------|------------------|
459+
| texify | 0.122617 | 35.6345 |
460+
461+
This inferences texify on a ground truth set of LaTeX, then does edit distance. This is a bit noisy, since 2 LaTeX strings that render the same can have different symbols in them.
462+
455463
## Running your own benchmarks
456464

457465
You can benchmark the performance of surya on your machine.

poetry.lock

Lines changed: 3 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)