Skip to content

Releases: open-edge-platform/datumaro

Release v0.2: Dataset versioning

14 Oct 15:42
7e8615c
Compare
Choose a tag to compare

This release adds dataset versioning capabilities and significantly changes the command line.
It also improves CLI and API documentation, and extends the transformations library.

A Datumaro project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are now modified inplace, by default. The project layout is updated. To update
an old project to the new version, use datum project migrate.

Added

  • A new installation target: pip install datumaro[default], which should be
    used in most cases by default. The simple datumaro is supposed for library users (#238)
  • Dataset and project versioning capabilities (Git-like) (#238)
  • [CLI] "dataset revpath" concept in CLI, allowing to pass a dataset path with
    the dataset format in diff, merge, explain and info CLI commands (#238)
  • [CLI] import, remove, commit, checkout, log, status, info CLI commands (#238)
  • [CLI] patch CLI command to patch one dataset from another (#401)
  • [CLI, API] ProjectLabels transform to change dataset labels for merging etc. (#401, #478)
  • [API] Type annotations and docs for Annotation classes (#493)
  • [formats] Support for custom labels in the KITTI detection format (#481)
  • [formats] Coco*Extractor classes now have an option to preserve label IDs from the
    original annotation file (#453)
  • [formats] Options to control label loading behavior in imagenet_txt import (#434, #489)
  • Data collection by telemetry. Check this notice about the details (#495)

Changed

  • A project can contain and manage multiple datasets instead of a single one.
    CLI operations can be applied to the whole project, or to separate datasets.
    Datasets are modified inplace, by default (#328)
  • [CLI] The import command copies datasets by default. Use add to add datasets without copying (#508)
  • [CLI] Projects use new file layout, incompatible with old projects.
    An old project can be updated with datum project migrate (#238)
  • [CLI] diff and ediff are joined into a single diff CLI command (#238)
  • [CLI] CLI help for builtin plugins doesn't require project (#328)
  • [API] The Project class from datumaro.components is changed completely (#238)
  • [API] Inheriting CliPlugin is not required in plugin classes (#238)
  • [API] Importers do not create Projects anymore and just return a list of
    extractor configurations (#238)
  • [API] Annotation-related classes were moved into a new module,
    datumaro.components.annotation (#439)
  • [API] Rollback utilities replaced with Scope utilities (#444)

Removed

  • [CLI] project merge CLI command (#238)
  • Support for project hierarchies. A project cannot be a source anymore (#238)
  • A project cannot have independent internal dataset anymore. All the project
    data must be stored in the project data sources (#238)
  • datumaro_project format (#238)
  • [API] Unused path field of DatasetItem (#455)

Fixed

  • Deprecation warning in open_images_format.py (#440)
  • lazy_image returning unrelated data sometimes (#409)
  • Invalid call to pycocotools.mask.iou (#450)
  • Importing of Open Images datasets without image data (#463)
  • Return value type in Dataset.is_modified (#401)
  • Incorrect remapping of secondary categories in RemapLabels (#401)
  • VOC dataset patching for classification and segmentation tasks (#478)
  • Exported mask label ids in KITTI segmentation (#481)
  • Missing label for Points read in the LFW format (#494)

Release v0.1.11

24 Aug 13:16
4714810
Compare
Choose a tag to compare

Added

Changed

  • Datumaro no longer depends on scikit-image (#379)
  • Dataset remembers export options on saving / exporting for the first time (#386)

Fixed

  • Application of remap_labels to dataset categories of different length (#314)
  • Patching of datasets in formats (#348)
  • Improved Cityscapes export performance (#367)
  • Incorrect format of *_labelIds.png in Cityscapes export (#325, #342)
  • Item id in ImageNet format (#371)
  • Double quotes for ICDAR Word Recognition (#375)
  • Wrong display of builtin formats in CLI (#332)
  • Non utf-8 encoding of annotation files in Market-1501 export (#392)
  • Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (#392)
  • Saving of images with Unicode paths on Windows (#392)
  • Calling ProjectDataset.transform() with a string argument (#402)
  • Attributes casting for CVAT format (#403)
  • Loading of custom project plugins (#404)

Security

  • Fixed unsafe unpickling in CIFAR import (#362)

Release v0.1.10

15 Jul 14:05
a7b712e
Compare
Choose a tag to compare

Added

  • Support for import/export zip archives with images (#273)
  • Subformat importers for VOC and COCO (#281)
  • Support for KITTI dataset segmentation and detection format (#282)
  • Updated YOLO format user manual (#295)
  • ItemTransform class, which describes item-wise dataset Transforms (#297)
  • keep-empty export parameter in VOC format (#297)
  • A base class for dataset validation plugins (#299)
  • Partial support for the Open Images format; only images and image-level labels can be read/written (#291, #315).
  • Support for Supervisely Point Cloud dataset format (#245, #353)
  • Support for KITTI Raw / Velodyne Points dataset format (#245)
  • Support for CIFAR-100 and documentation for CIFAR-10/100 (#301)

Changed

  • Tensorflow AVX check is made optional in API and disabled by default (#305)
  • Extensions for images in ImageNet_txt are now mandatory (#302)
  • Several dependencies now have lower bounds (#308)

Fixed

  • Incorrect image layout on saving and a problem with ecoding on loading (#284)
  • An error when xpath fiter is applied to the dataset or its subset (#259)
  • Tracking of Dataset changes done by transforms (#297)
  • Improved CLI startup time in several cases (#306)

Security

  • Known issue: loading CIFAR can result in arbitrary code execution (#327)

Release v0.1.9

03 Jun 17:50
057740e
Compare
Choose a tag to compare

Added

  • Support for escaping in attribute values in LabelMe format (#49)
  • Support for Segmentation Splitting (#223)
  • Support for CIFAR-10/100 dataset format (#225, #243)
  • Support for COCO panoptic and stuff format (#210)
  • Documentation file and integration tests for Pascal VOC format (#228)
  • Support for MNIST and MNIST in CSV dataset formats (#234)
  • Documentation file for COCO format (#241)
  • Documentation file and integration tests for YOLO format (#246)
  • Support for Cityscapes dataset format (#249)
  • Support for Validator configurable threshold (#250)

Changed

  • LabelMe format saves dataset items with their relative paths by subsets without changing names (#200)
  • Allowed arbitrary subset count and names in classification and detection splitters (#207)
  • Annotation-less dataset elements are now participate in subset splitting (#211)
  • Classification task in LFW dataset format (#222)
  • Testing is now performed with pytest instead of unittest (#248)

Fixed

  • Added support for auto-merging (joining) of datasets with no labels and having labels (#200)
  • Allowed explicit label removal in remap_labels transform (#203)
  • Image extension in CVAT format export (#214)
  • Added a label "face" for bounding boxes in Wider Face (#215)
  • Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (#216)
  • Empty lines in YOLO annotations are ignored (#221)
  • Export in VOC format when no image info is available (#239)
  • Fixed saving attribute in WiderFace extractor (#251)

Release v0.1.8

31 Mar 15:06
25d459e
Compare
Choose a tag to compare

Changed

  • Added an option to allow undeclared annotation attributes in CVAT format export (#192)
  • COCO exports images in separate directories by subsets. Added an option to control this (#195)

Fixed

  • Instance masks of background class no more introduce an instance (#188)
  • Added support for label attributes in Datumaro format (#192)

Release v0.1.7

24 Mar 13:28
9580d5d
Compare
Choose a tag to compare

Added

  • OpenVINO plugin examples (#159)
  • Dataset validation for classification and detection datasets (#160)
  • Arbitrary image extensions in formats (import and export) (#166)
  • Ability to set a custom subset name for an imported dataset (#166)
  • CLI support for NDR(#178)

Changed

  • Common ICDAR format is split into 3 sub-formats (#174)

Fixed

  • The ability to work with file names containing Cyrillic and spaces (#148)
  • Image reading and saving in ICDAR formats (#174)
  • Unnecessary image loading on dataset saving (#176)
  • Allowed spaces in ICDAR captions (#182)
  • Saving of masks in VOC when masks are not requested (#184)

Release v0.1.6 hotfix

02 Mar 12:24
ff50a77
Compare
Choose a tag to compare

Fixed

  • Images with no annotations are exported again in VOC formats (#123)
  • Inference result for only one output layer in OpenVINO launcher (#125)

Release v0.1.6

28 Feb 09:33
48731fb
Compare
Choose a tag to compare

Added

  • Icdar13/15 dataset format (#96)
  • Laziness, source caching, tracking of changes and partial updating for Dataset (#102)
  • Market-1501 dataset format (#108)
  • LFW dataset format (#110)
  • Support of polygons' and masks' confusion matrices and mismathing classes in diff command (#117)
  • Add near duplicate image removal plugin (#113)

Changed

  • OpenVINO model launcher is updated for OpenVINO r2021.1 (#100)

Fixed

  • High memory consumption and low performance of mask import/export, #53 (#101)
  • Masks, covered by class 0 (background) masks, should be exported with holes inside (#104)
  • diff command invocation problem with missing class methods (#117)

Release v0.1.5

23 Jan 09:18
48731fb
Compare
Choose a tag to compare

Added

  • WiderFace dataset format (#65, #90)
  • Function to transform annotations to labels (#66)
  • Dataset splits for classification, detection and re-id tasks (#68, #81)
  • VGGFace2 dataset format (#69, #82)
  • Unique image count statistic (#87)
  • Installation with pip: pip install datumaro

Changed

  • Dataset class extended with new operations: save, load, export, import_from, detect, run_model (#71)
  • Allowed importing Extractor-only defined formats (in Project.import_from, dataset.import_from and CLI/project import) (#71)
  • datum project ... commands replaced with datum ... commands (#84)
  • Supported more image formats in ImageNet extractors (#85)
  • Allowed adding Importer-defined formats as project sources (source add) (#86)
  • Added max search depth in ImageDir format and importers (#86)

Deprecated

  • datum project ... CLI context (#84)
  • Dataset format Importers will be joined with Extractors in the next release

Fixed

  • Allow plugins inherited from Extractor (instead of only SourceExtractor) (#70)
  • Windows installation with pip for pycocotools (#73)
  • YOLO extractor path matching on Windows (#73)
  • Fixed inplace file copying when saving images (#76)
  • Fixed labelmap parameter type checking in VOC converter (#76)
  • Fixed model copying on addition in CLI (#94)

Release v0.1.4

11 Dec 07:12
7407d12
Compare
Choose a tag to compare

Added

  • CamVid dataset format (#57)
  • Ability to install opencv-python-headless dependency with DATUMARO_HEADLESS=1 environment variable instead of opencv-python (#62)

Changed

  • Allow empty supercategory in COCO (#54)
  • Allow Pascal VOC to search in subdirectories (#50)