How to train a recognition model with mutlple boxes per image? #1901

haimat · 2025-03-17T14:25:30Z

haimat
Mar 17, 2025

After labeling my custom OCR dataset via docTR-Labeler I have annotation files like this:

{
    "myimage.png": {
        "img_dimensions": [
            160,
            3147
        ],
        "img_hash": "3b287306dc2e3e4cae1a514efab545e7561bba1d5688ff22f90e0a2860c6ac69",
        "polygons": [
            [
                [
                    293.0,
                    67.0
                ],
                [
                    478.0,
                    64.0
                ],
                [
                    479.0,
                    142.0
                ],
                [
                    294.0,
                    143.0
                ]
            ],
            [
                [
                    657.0,
                    139.0
                ],
                [
                    658.0,
                    69.0
                ],
                [
                    839.0,
                    72.0
                ],
                [
                    839.0,
                    142.0
                ]
            ]
        ],
        "labels": [
            "TEST",
            "7314"
        ],
        "types": [
            "words",
            "words"
        ]
    }
}

As you can see, there are two labels on this image. However, your recognition readme only refers to one label per image. So how should I convert the output from docTR-Labeler to the correct format for training of a custom docTR recognition model in this case?

Answered by felixT2K

Mar 17, 2025

correct one image == one word for recognition so yes from the detection labels you have to crop :)

https://github.com/mindee/doctr/tree/main/references/recognition#data-format

images folder with all the word crop images and one labels.json so at the end you have this twice one images folder + labels.json for train and one images folder + labels.json for val :)

├── images
    ├── img_1.jpg
    ├── img_2.jpg
    ├── img_3.jpg
    └── ...
├── labels.json

{
    "img_1.jpg": "I",
    "img_2.jpg": "am",
    "img_3.jpg": "a",
    "img_4.jpg": "Jedi",
    "img_5.jpg": "!",
    ...
}

reference datasets can be found here: #1654

View full answer

haimat · 2025-03-17T14:31:33Z

haimat
Mar 17, 2025
Author

... or do I have to cut out (crop) each of these image boxes as specified in the json from docTR-labeler, then save them as their own images and put all them into one large json file?

1 reply

felixT2K Mar 17, 2025

correct one image == one word for recognition so yes from the detection labels you have to crop :)

https://github.com/mindee/doctr/tree/main/references/recognition#data-format

images folder with all the word crop images and one labels.json so at the end you have this twice one images folder + labels.json for train and one images folder + labels.json for val :)

├── images
    ├── img_1.jpg
    ├── img_2.jpg
    ├── img_3.jpg
    └── ...
├── labels.json

{
    "img_1.jpg": "I",
    "img_2.jpg": "am",
    "img_3.jpg": "a",
    "img_4.jpg": "Jedi",
    "img_5.jpg": "!",
    ...
}

reference datasets can be found here: #1654

Answer selected by haimat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train a recognition model with mutlple boxes per image? #1901

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to train a recognition model with mutlple boxes per image? #1901

haimat Mar 17, 2025

Replies: 1 comment · 1 reply

haimat Mar 17, 2025 Author

felixT2K Mar 17, 2025

haimat
Mar 17, 2025

Replies: 1 comment 1 reply

haimat
Mar 17, 2025
Author