Skip to content

Results format of mask might be incorrect, please advise #6487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yastrazo opened this issue Jul 17, 2023 · 12 comments
Closed

Results format of mask might be incorrect, please advise #6487

yastrazo opened this issue Jul 17, 2023 · 12 comments
Labels
question Further information is requested

Comments

@yastrazo
Copy link

mask format of CVAT annotation is encoded in RLE method per the document XML annotation format . The actual annotation data probably can not decode in RLE format since the sum of RLE value does not equal the area size of the mask bounding bbox which is width*height. The snapshot of the annotation result is as below:
image

Please advise how to read it correct or provide ETA for fix.

@zhiltsov-max
Copy link
Contributor

Hi, the format should correspond to COCO RLE, which is described here. Here you can find export conversion details.

@zhiltsov-max zhiltsov-max added the question Further information is requested label Jul 17, 2023
@zhengnanc
Copy link

Screenshot from 2023-07-18 09-16-31

I used the code provided https://github.com/opencv/cvat/blob/develop/cvat/apps/dataset_manager/formats/transformations.py#L41 to decode the rle mask above; however, the result does not look right. Is there a bug in the CVAT or the decode method is wrong. Please help check.

@FreshLucas-git
Copy link

@zhiltsov-max I have same question. I converted binary mask to RLE mask(coco format). It doesn't match CVAT RLE mask format. It seems that CVAT RLE format is not same as coco RLE. Do you know how to convert binary mask to CVAT RLE mask?

@kiwifig
Copy link

kiwifig commented Jul 18, 2023

@FreshLucas-git what does it mean of binary? Is it the RLE string in the xml file generated by CVAT?

@zhiltsov-max
Copy link
Contributor

Hi, here you can find an example. Feel free to modify it and try with your values from XML.

from functools import reduce
import cv2
import numpy as np
from pycocotools import mask as mask_utils

# take these values from cvat xml file
serialized_cvat_image = dict(
    width="1280" ,
    height="720"
)

serialized_cvat_rle = dict(
    rle="141, 6, 235, 8, 233, 10, 232, 11, 231, 12, 230, 13, 229, 15, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 15, 228, 16, 227, 16, 227, 16, 228, 15, 228, 15, 227, 17, 226, 17, 226, 17, 226, 17, 226, 17, 226, 17, 227, 16, 227, 16, 138, 7, 82, 16, 131, 15, 81, 15, 130, 18, 80, 15, 128, 21, 80, 14, 126, 24, 78, 15, 125, 26, 77, 15, 123, 31, 74, 15, 121, 34, 73, 15, 119, 38, 71, 15, 117, 41, 70, 15, 116, 44, 68, 15, 114, 14, 4, 28, 68, 15, 112, 15, 6, 31, 64, 15, 110, 15, 10, 30, 63, 15, 108, 15, 13, 31, 61, 14, 107, 15, 15, 32, 60, 14, 105, 15, 19, 31, 59, 14, 104, 14, 24, 29, 57, 15, 102, 15, 26, 28, 57, 15, 100, 15, 29, 28, 56, 15, 99, 14, 33, 26, 56, 15, 97, 14, 38, 26, 53, 15, 96, 13, 40, 28, 51, 15, 94, 13, 45, 26, 50, 15, 92, 14, 47, 27, 48, 15, 91, 13, 50, 28, 46, 15, 89, 13, 53, 27, 46, 15, 88, 12, 56, 28, 44, 15, 86, 13, 57, 30, 42, 15, 85, 12, 59, 32, 40, 15, 83, 13, 60, 34, 38, 14, 82, 14, 65, 30, 38, 14, 81, 14, 67, 29, 38, 14, 79, 14, 71, 27, 38, 14, 78, 13, 75, 24, 38, 14, 77, 13, 79, 21, 39, 14, 75, 13, 83, 18, 40, 14, 74, 13, 84, 17, 41, 14, 72, 14, 87, 13, 43, 13, 72, 13, 91, 9, 44, 14, 70, 13, 94, 6, 46, 14, 69, 13, 147, 14, 67, 13, 149, 13, 66, 13, 151, 13, 64, 14, 151, 14, 62, 14, 153, 14, 60, 15, 153, 15, 59, 15, 154, 14, 58, 15, 156, 14, 56, 15, 158, 13, 56, 15, 159, 13, 54, 15, 161, 13, 53, 14, 162, 13, 53, 13, 164, 13, 52, 11, 166, 14, 50, 12, 167, 13, 49, 13, 168, 13, 48, 13, 169, 13, 46, 13, 170, 14, 45, 12, 172, 13, 44, 13, 173, 13, 42, 14, 173, 14, 41, 13, 175, 13, 40, 13, 176, 14, 39, 12, 178, 14, 38, 12, 179, 14, 36, 12, 181, 14, 35, 12, 181, 15, 33, 13, 182, 14, 32, 13, 92, 30, 62, 14, 31, 13, 88, 37, 60, 14, 29, 14, 81, 51, 54, 13, 29, 13, 67, 69, 52, 13, 28, 12, 62, 77, 51, 13, 26, 13, 57, 84, 49, 13, 26, 12, 54, 92, 46, 13, 25, 11, 50, 101, 42, 13, 24, 12, 49, 104, 41, 12, 24, 12, 48, 106, 41, 11, 23, 13, 47, 69, 1, 40, 38, 12, 22, 12, 48, 50, 27, 37, 34, 12, 22, 12, 48, 48, 31, 37, 33, 12, 20, 13, 48, 40, 47, 33, 29, 12, 20, 12, 49, 24, 66, 71, 20, 11, 50, 19, 72, 70, 20, 11, 51, 13, 80, 68, 19, 11, 51, 11, 87, 63, 18, 12, 51, 11, 89, 61, 18, 12, 51, 11, 91, 59, 18, 11, 52, 11, 92, 58, 17, 12, 52, 11, 97, 53, 17, 12, 52, 11, 98, 51, 17, 13, 52, 11, 99, 49, 18, 13, 52, 11, 107, 5, 2, 11, 2, 5, 4, 11, 19, 12, 53, 11, 165, 13, 53, 11, 165, 12, 54, 11, 165, 11, 55, 11, 164, 12, 55, 11, 163, 13, 55, 12, 162, 13, 55, 12, 162, 12, 57, 11, 161, 12, 58, 11, 161, 12, 58, 11, 160, 12, 59, 12, 159, 12, 59, 13, 158, 12, 60, 13, 156, 13, 61, 12, 156, 12, 62, 13, 155, 11, 63, 14, 154, 11, 64, 14, 152, 12, 65, 13, 152, 12, 65, 14, 151, 12, 66, 13, 151, 11, 68, 13, 150, 11, 69, 13, 148, 11, 70, 14, 147, 11, 71, 14, 145, 12, 72, 14, 144, 12, 72, 16, 142, 12, 73, 16, 140, 12, 75, 16, 138, 13, 76, 16, 137, 12, 78, 16, 135, 12, 80, 16, 134, 12, 81, 16, 132, 13, 82, 16, 130, 13, 84, 17, 128, 13, 86, 16, 127, 12, 88, 16, 125, 13, 89, 16, 123, 13, 91, 16, 122, 13, 92, 16, 121, 13, 93, 16, 120, 12, 95, 16, 119, 14, 94, 15, 119, 16, 93, 15, 119, 17, 92, 15, 119, 19, 90, 15, 119, 20, 89, 15, 119, 21, 88, 15, 118, 22, 88, 15, 118, 22, 88, 15, 118, 22, 88, 15, 118, 23, 87, 14, 120, 23, 86, 14, 121, 23, 84, 14, 123, 23, 83, 14, 125, 21, 83, 15, 126, 21, 81, 15, 126, 21, 81, 15, 126, 23, 78, 16, 126, 24, 77, 16, 126, 25, 76, 16, 128, 24, 75, 16, 129, 25, 73, 16, 130, 24, 73, 16, 130, 26, 71, 16, 132, 25, 71, 15, 132, 25, 71, 15, 134, 24, 70, 14, 135, 24, 70, 14, 137, 23, 68, 15, 138, 23, 67, 14, 140, 23, 66, 14, 141, 23, 65, 14, 143, 21, 65, 14, 143, 24, 62, 14, 144, 24, 60, 14, 145, 24, 60, 14, 146, 23, 60, 14, 146, 23, 60, 14, 148, 21, 59, 15, 149, 19, 60, 15, 149, 18, 61, 14, 151, 16, 62, 14, 154, 12, 63, 13, 155, 10, 65, 14, 154, 8, 67, 14, 154, 6, 68, 15, 228, 15, 227, 16, 227, 16, 227, 16, 227, 17, 226, 18, 225, 18, 225, 18, 225, 18, 225, 18, 225, 18, 225, 19, 224, 19, 224, 19, 226, 17, 226, 17, 226, 17, 226, 17, 226, 17, 226, 16, 227, 16, 228, 15, 228, 15, 209, 34, 198, 44, 192, 50, 187, 56, 181, 62, 175, 68, 170, 72, 167, 75, 164, 78, 160, 82, 105, 6, 45, 85, 105, 8, 37, 57, 24, 10, 105, 10, 29, 55, 36, 6, 105, 12, 22, 53, 155, 20, 5, 57, 160, 78, 164, 75, 167, 70, 173, 66, 176, 65, 178, 57, 186, 54, 189, 42, 201, 37, 206, 31, 212, 20, 223, 14, 229, 14, 229, 14, 228, 15, 228, 15, 228, 16, 227, 16, 227, 16, 227, 17, 226, 17, 226, 18, 225, 18, 225, 18, 225, 19, 224, 19, 224, 19, 225, 18, 225, 19, 225, 18, 225, 18, 226, 17, 226, 18, 225, 18, 226, 17, 226, 18, 225, 19, 224, 20, 224, 20, 223, 21, 222, 21, 222, 21, 223, 20, 223, 20, 223, 20, 224, 19, 225, 18, 226, 17, 227, 15, 228, 15, 229, 14, 229, 14, 228, 15, 228, 15, 228, 14, 228, 15, 228, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 16, 227, 17, 226, 17, 226, 18, 226, 17, 226, 17, 226, 18, 225, 18, 225, 18, 225, 18, 225, 18, 226, 17, 226, 17, 227, 16, 227, 16, 227, 15, 228, 15, 229, 13, 230, 13, 229, 14, 229, 14, 229, 14, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 15, 228, 14, 228, 15, 228, 15, 228, 14, 229, 14, 229, 14, 229, 13, 230, 13, 230, 12, 230, 13, 230, 13, 229, 14, 229, 13, 230, 13, 229, 14, 229, 14, 229, 14, 228, 15, 228, 14, 229, 13, 230, 12, 231, 11, 231, 11, 232, 9, 234, 7, 236, 5, 60" ,
    left="400" ,
    top="169" ,
    width="242" ,
    height="374",
)

def cvat_rle_to_binary_image_mask(cvat_rle: dict, img_h: int, img_w: int) -> np.ndarray:
    # convert CVAT tight object RLE to COCO-style whole image mask
    rle = cvat_rle['rle']
    left = cvat_rle['left']
    top = cvat_rle['top']
    width = cvat_rle['width']

    mask = np.zeros((img_h, img_w), dtype=np.uint8)
    value = 0
    offset = 0
    for rle_count in rle:
        while rle_count > 0:
            y, x = divmod(offset, width)
            mask[y + top][x + left] = value
            rle_count -= 1
            offset += 1
        value = 1 - value

    return mask

def binary_image_mask_to_cvat_rle(image: np.ndarray) -> dict:
    # convert COCO-style whole image mask to CVAT tight object RLE

    istrue = np.argwhere(image == 1).transpose()
    top = int(istrue[0].min())
    left = int(istrue[1].min())
    bottom = int(istrue[0].max())
    right = int(istrue[1].max())
    roi_mask = image[top:bottom + 1, left:right + 1]

    # compute RLE values
    def reduce_fn(acc, v):
        if v == acc['val']:
            acc['res'][-1] += 1
        else:
            acc['val'] = v
            acc['res'].append(1)
        return acc
    roi_rle = reduce(
        reduce_fn,
        roi_mask.flat,
        { 'res': [0], 'val': False }
    )['res']

    cvat_rle = {
        'rle': roi_rle,
        'top': top,
        'left': left,
        'width': right - left + 1,
        'height': bottom - top + 1,
    }

    return cvat_rle

def cvat_rle_to_coco_rle(cvat_rle: dict, img_h: int, img_w: int) -> dict:
    # covert CVAT tight object RLE to COCO whole image mask RLE
    binary_image_mask = cvat_rle_to_binary_image_mask(cvat_rle, img_h=img_h, img_w=img_w)
    return mask_utils.encode(np.asfortranarray(binary_image_mask))

def deserialize_cvat_rle(serialized_cvat_rle: dict) -> dict:
    return {
        'rle': list(map(int, serialized_cvat_rle['rle'].split(','))),
        'top': int(serialized_cvat_rle['top']),
        'left': int(serialized_cvat_rle['left']),
        'width': int(serialized_cvat_rle['width']),
        'height': int(serialized_cvat_rle['height']),
    }

def serialize_cvat_rle(cvat_rle: dict) -> dict:
    return {
        'rle': ', '.join(map(str, cvat_rle['rle'])),
        'top': str(cvat_rle['top']),
        'left': str(cvat_rle['left']),
        'width': str(cvat_rle['width']),
        'height': str(cvat_rle['height']),
    }


def test(serialized_cvat_image: dict, serialized_cvat_rle: dict):
    img_w = int(serialized_cvat_image['width'])
    img_h = int(serialized_cvat_image['height'])

    # HWC BGR [0, 1] image for OpenCV, you can use cv2.imread() instead
    image = np.zeros((img_h, img_w, 3), np.float32)

    cvat_rle = deserialize_cvat_rle(serialized_cvat_rle)
    mask = cvat_rle_to_binary_image_mask(cvat_rle, img_h=img_h, img_w=img_w)
    assert mask.shape == (img_h, img_w)

    cvat_rle2 = binary_image_mask_to_cvat_rle(mask)
    assert cvat_rle == cvat_rle2

    coco_rle = cvat_rle_to_coco_rle(cvat_rle, img_h=img_h, img_w=img_w)
    assert np.array_equal(mask_utils.decode(coco_rle), mask)

    image_with_mask = image.copy()
    image_with_mask[mask == 1] = 1

    # Add rectangle around the mask
    where_roi = np.argwhere(mask).transpose()
    top = int(where_roi[0].min())
    left = int(where_roi[1].min())
    bottom = int(where_roi[0].max())
    right = int(where_roi[1].max())
    image_with_mask = cv2.rectangle(image_with_mask, (left, top), (right, bottom), (0, 1, 0), thickness=2)

    cv2.imshow("demo", image_with_mask)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

test(serialized_cvat_image, serialized_cvat_rle)

@FreshLucas-git
Copy link

@kiwifig The binary mask is generated from the inference output of neural network model. Then, I encoded the binary mask to RLE mask. CVAT xml file store this mask as encoded RLE mask format.

@zhiltsov-max Thank you so much for detailed explanation. I'll try this out.

@zhengnanc
Copy link

zhengnanc commented Jul 19, 2023

@zhiltsov-max That is the code I used to decode rle from CVAT xml file, but unfortunately, the result mask is not correct

@zhengnanc
Copy link

annotation.zip
Here is the image and xml file I got from CVAT labeling tools. However, the decoded mask does not look right. (sum of rle does not equal to the bounding box area). Please help check. maybe there is a bug in the labeling tool. @zhiltsov-max

@bsekachev
Copy link
Member

@zhengnanc

Probably width and height just have incorrect values. They are actually + 1.

37 * 76 = 2812

And in your case:

image

@bsekachev
Copy link
Member

Related issue is #5828 and I believe we fixed it here #5905

So, what is your CVAT version?

@zhengnanc
Copy link

@bsekachev Thanks a lot, both w and h should +1. And, I don't really know the CVAT version since I am not the person who labeled the image.

@yastrazo
Copy link
Author

yastrazo commented Aug 1, 2023

@bsekachev the version that we're using is latest CVAT DEV release since 30.5.23. Can you please give us concrete version that we can deploy to solve the issue?
@zhengnanc FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants