-
After labeling my custom OCR dataset via docTR-Labeler I have annotation files like this: {
"myimage.png": {
"img_dimensions": [
160,
3147
],
"img_hash": "3b287306dc2e3e4cae1a514efab545e7561bba1d5688ff22f90e0a2860c6ac69",
"polygons": [
[
[
293.0,
67.0
],
[
478.0,
64.0
],
[
479.0,
142.0
],
[
294.0,
143.0
]
],
[
[
657.0,
139.0
],
[
658.0,
69.0
],
[
839.0,
72.0
],
[
839.0,
142.0
]
]
],
"labels": [
"TEST",
"7314"
],
"types": [
"words",
"words"
]
}
} As you can see, there are two labels on this image. However, your recognition readme only refers to one label per image. So how should I convert the output from docTR-Labeler to the correct format for training of a custom docTR recognition model in this case? |
Beta Was this translation helpful? Give feedback.
Answered by
felixT2K
Mar 17, 2025
Replies: 1 comment 1 reply
-
... or do I have to cut out (crop) each of these image boxes as specified in the json from docTR-labeler, then save them as their own images and put all them into one large json file? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
correct one image == one word for recognition so yes from the detection labels you have to crop :)
https://github.com/mindee/doctr/tree/main/references/recognition#data-format
images folder with all the word crop images and one labels.json so at the end you have this twice one images folder + labels.json for train and one images folder + labels.json for val :)
reference datasets can be found here: #1654