Releases: open-edge-platform/datumaro
Releases · open-edge-platform/datumaro
Release v0.2: Dataset versioning
This release adds dataset versioning capabilities and significantly changes the command line.
It also improves CLI and API documentation, and extends the transformations library.
A Datumaro project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are now modified inplace, by default. The project layout is updated. To update
an old project to the new version, use datum project migrate
.
Added
- A new installation target:
pip install datumaro[default]
, which should be
used in most cases by default. The simpledatumaro
is supposed for library users (#238) - Dataset and project versioning capabilities (Git-like) (#238)
- [CLI] "dataset revpath" concept in CLI, allowing to pass a dataset path with
the dataset format indiff
,merge
,explain
andinfo
CLI commands (#238) - [CLI]
import
,remove
,commit
,checkout
,log
,status
,info
CLI commands (#238) - [CLI]
patch
CLI command to patch one dataset from another (#401) - [CLI, API]
ProjectLabels
transform to change dataset labels for merging etc. (#401, #478) - [API] Type annotations and docs for
Annotation
classes (#493) - [formats] Support for custom labels in the KITTI detection format (#481)
- [formats]
Coco*Extractor
classes now have an option to preserve label IDs from the
original annotation file (#453) - [formats] Options to control label loading behavior in
imagenet_txt
import (#434, #489) - Data collection by telemetry. Check this notice about the details (#495)
Changed
- A project can contain and manage multiple datasets instead of a single one.
CLI operations can be applied to the whole project, or to separate datasets.
Datasets are modified inplace, by default (#328) - [CLI] The
import
command copies datasets by default. Useadd
to add datasets without copying (#508) - [CLI] Projects use new file layout, incompatible with old projects.
An old project can be updated withdatum project migrate
(#238) - [CLI]
diff
andediff
are joined into a singlediff
CLI command (#238) - [CLI] CLI help for builtin plugins doesn't require project (#328)
- [API] The
Project
class fromdatumaro.components
is changed completely (#238) - [API] Inheriting
CliPlugin
is not required in plugin classes (#238) - [API]
Importer
s do not createProject
s anymore and just return a list of
extractor configurations (#238) - [API] Annotation-related classes were moved into a new module,
datumaro.components.annotation
(#439) - [API] Rollback utilities replaced with Scope utilities (#444)
Removed
- [CLI]
project merge
CLI command (#238) - Support for project hierarchies. A project cannot be a source anymore (#238)
- A project cannot have independent internal dataset anymore. All the project
data must be stored in the project data sources (#238) datumaro_project
format (#238)- [API] Unused
path
field ofDatasetItem
(#455)
Fixed
- Deprecation warning in
open_images_format.py
(#440) lazy_image
returning unrelated data sometimes (#409)- Invalid call to
pycocotools.mask.iou
(#450) - Importing of Open Images datasets without image data (#463)
- Return value type in
Dataset.is_modified
(#401) - Incorrect remapping of secondary categories in
RemapLabels
(#401) - VOC dataset patching for classification and segmentation tasks (#478)
- Exported mask label ids in KITTI segmentation (#481)
- Missing
label
forPoints
read in the LFW format (#494)
Release v0.1.11
Added
- The Open Images format now supports bounding box and segmentation mask annotations (#352, #388).
- Bounding boxes values decrement transform (#366)
- Improved error reporting in
Dataset
(#386) - Support ADE20K format (import only) (#400)
- Documentation website at https://openvinotoolkit.github.io/datumaro (#420)
Changed
- Datumaro no longer depends on
scikit-image
(#379) Dataset
remembers export options on saving / exporting for the first time (#386)
Fixed
- Application of
remap_labels
to dataset categories of different length (#314) - Patching of datasets in formats (#348)
- Improved Cityscapes export performance (#367)
- Incorrect format of
*_labelIds.png
in Cityscapes export (#325, #342) - Item id in ImageNet format (#371)
- Double quotes for ICDAR Word Recognition (#375)
- Wrong display of builtin formats in CLI (#332)
- Non utf-8 encoding of annotation files in Market-1501 export (#392)
- Import of ICDAR, PASCAL VOC and VGGFace2 images from subdirectories on WIndows (#392)
- Saving of images with Unicode paths on Windows (#392)
- Calling
ProjectDataset.transform()
with a string argument (#402) - Attributes casting for CVAT format (#403)
- Loading of custom project plugins (#404)
Security
- Fixed unsafe unpickling in CIFAR import (#362)
Release v0.1.10
Added
- Support for import/export zip archives with images (#273)
- Subformat importers for VOC and COCO (#281)
- Support for KITTI dataset segmentation and detection format (#282)
- Updated YOLO format user manual (#295)
ItemTransform
class, which describes item-wise datasetTransform
s (#297)keep-empty
export parameter in VOC format (#297)- A base class for dataset validation plugins (#299)
- Partial support for the Open Images format; only images and image-level labels can be read/written (#291, #315).
- Support for Supervisely Point Cloud dataset format (#245, #353)
- Support for KITTI Raw / Velodyne Points dataset format (#245)
- Support for CIFAR-100 and documentation for CIFAR-10/100 (#301)
Changed
- Tensorflow AVX check is made optional in API and disabled by default (#305)
- Extensions for images in ImageNet_txt are now mandatory (#302)
- Several dependencies now have lower bounds (#308)
Fixed
- Incorrect image layout on saving and a problem with ecoding on loading (#284)
- An error when xpath fiter is applied to the dataset or its subset (#259)
- Tracking of
Dataset
changes done by transforms (#297) - Improved CLI startup time in several cases (#306)
Security
- Known issue: loading CIFAR can result in arbitrary code execution (#327)
Release v0.1.9
Added
- Support for escaping in attribute values in LabelMe format (#49)
- Support for Segmentation Splitting (#223)
- Support for CIFAR-10/100 dataset format (#225, #243)
- Support for COCO panoptic and stuff format (#210)
- Documentation file and integration tests for Pascal VOC format (#228)
- Support for MNIST and MNIST in CSV dataset formats (#234)
- Documentation file for COCO format (#241)
- Documentation file and integration tests for YOLO format (#246)
- Support for Cityscapes dataset format (#249)
- Support for Validator configurable threshold (#250)
Changed
- LabelMe format saves dataset items with their relative paths by subsets without changing names (#200)
- Allowed arbitrary subset count and names in classification and detection splitters (#207)
- Annotation-less dataset elements are now participate in subset splitting (#211)
- Classification task in LFW dataset format (#222)
- Testing is now performed with pytest instead of unittest (#248)
Fixed
- Added support for auto-merging (joining) of datasets with no labels and having labels (#200)
- Allowed explicit label removal in
remap_labels
transform (#203) - Image extension in CVAT format export (#214)
- Added a label "face" for bounding boxes in Wider Face (#215)
- Allowed adding "difficult", "truncated", "occluded" attributes when converting to Pascal VOC if these attributes are not present (#216)
- Empty lines in YOLO annotations are ignored (#221)
- Export in VOC format when no image info is available (#239)
- Fixed saving attribute in WiderFace extractor (#251)
Release v0.1.8
Release v0.1.7
Added
- OpenVINO plugin examples (#159)
- Dataset validation for classification and detection datasets (#160)
- Arbitrary image extensions in formats (import and export) (#166)
- Ability to set a custom subset name for an imported dataset (#166)
- CLI support for NDR(#178)
Changed
- Common ICDAR format is split into 3 sub-formats (#174)
Fixed
Release v0.1.6 hotfix
Release v0.1.6
Added
Icdar13/15
dataset format (#96)- Laziness, source caching, tracking of changes and partial updating for
Dataset
(#102) Market-1501
dataset format (#108)LFW
dataset format (#110)- Support of polygons' and masks' confusion matrices and mismathing classes in
diff
command (#117) - Add near duplicate image removal plugin (#113)
Changed
- OpenVINO model launcher is updated for OpenVINO r2021.1 (#100)
Fixed
Release v0.1.5
Added
WiderFace
dataset format (#65, #90)- Function to transform annotations to labels (#66)
- Dataset splits for classification, detection and re-id tasks (#68, #81)
VGGFace2
dataset format (#69, #82)- Unique image count statistic (#87)
- Installation with pip:
pip install datumaro
Changed
Dataset
class extended with new operations:save
,load
,export
,import_from
,detect
,run_model
(#71)- Allowed importing
Extractor
-only defined formats (inProject.import_from
,dataset.import_from
and CLI/project import
) (#71) datum project ...
commands replaced withdatum ...
commands (#84)- Supported more image formats in
ImageNet
extractors (#85) - Allowed adding
Importer
-defined formats as project sources (source add
) (#86) - Added max search depth in
ImageDir
format and importers (#86)
Deprecated
datum project ...
CLI context (#84)- Dataset format
Importer
s will be joined withExtractor
s in the next release
Fixed
- Allow plugins inherited from
Extractor
(instead of onlySourceExtractor
) (#70) - Windows installation with
pip
forpycocotools
(#73) YOLO
extractor path matching on Windows (#73)- Fixed inplace file copying when saving images (#76)
- Fixed
labelmap
parameter type checking inVOC
converter (#76) - Fixed model copying on addition in CLI (#94)