Skip to content

Commit e1ed5f6

Browse files
author
Maxim Zhiltsov
authored
Update changelog and docs (cvat-ai#98)
* Update changelog * Update docs
1 parent 30c0648 commit e1ed5f6

File tree

5 files changed

+66
-41
lines changed

5 files changed

+66
-41
lines changed

CHANGELOG.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

88

9-
## 01/19/2021 - Release v0.1.5
9+
## 01/23/2021 - Release v0.1.5
1010
### Added
1111
- `WiderFace` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/65>, <https://github.com/openvinotoolkit/datumaro/pull/90>)
1212
- Function to transform annotations to labels (<https://github.com/openvinotoolkit/datumaro/pull/66>)
13-
- Task-specific Splitter (<https://github.com/openvinotoolkit/datumaro/pull/68>, <https://github.com/openvinotoolkit/datumaro/pull/81>)
13+
- Dataset splits for classification, detection and re-id tasks (<https://github.com/openvinotoolkit/datumaro/pull/68>, <https://github.com/openvinotoolkit/datumaro/pull/81>)
1414
- `VGGFace2` dataset format (<https://github.com/openvinotoolkit/datumaro/pull/69>, <https://github.com/openvinotoolkit/datumaro/pull/82>)
1515
- Unique image count statistic (<https://github.com/openvinotoolkit/datumaro/pull/87>)
16+
- Installation with pip by name `datumaro`
1617

1718
### Changed
1819
- `Dataset` class extended with new operations: `save`, `load`, `export`, `import_from`, `detect`, `run_model` (<https://github.com/openvinotoolkit/datumaro/pull/71>)
19-
- `Dataset` operations return `Dataset` instances, allowing to chain operations (<https://github.com/openvinotoolkit/datumaro/pull/71>)
2020
- Allowed importing `Extractor`-only defined formats (in `Project.import_from`, `dataset.import_from` and CLI/`project import`) (<https://github.com/openvinotoolkit/datumaro/pull/71>)
2121
- `datum project ...` commands replaced with `datum ...` commands (<https://github.com/openvinotoolkit/datumaro/pull/84>)
22-
- Supported more image formats in `ImageNet` extractor (<https://github.com/openvinotoolkit/datumaro/pull/85>)
22+
- Supported more image formats in `ImageNet` extractors (<https://github.com/openvinotoolkit/datumaro/pull/85>)
2323
- Allowed adding `Importer`-defined formats as project sources (`source add`) (<https://github.com/openvinotoolkit/datumaro/pull/86>)
2424
- Added max search depth in `ImageDir` format and importers (<https://github.com/openvinotoolkit/datumaro/pull/86>)
2525

README.md

+5-4
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ CVAT annotations ---> Publication, statistics etc.
122122

123123
[(Back to top)](#table-of-contents)
124124

125-
- Dataset reading, writing, conversion in any direction. Supported formats:
125+
- Dataset reading, writing, conversion in any direction. [Supported formats](docs/user_manual.md#supported-formats):
126126
- [COCO](http://cocodataset.org/#format-data) (`image_info`, `instances`, `person_keypoints`, `captions`, `labels`*)
127127
- [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/htmldoc/index.html) (`classification`, `detection`, `segmentation`, `action_classification`, `person_layout`)
128128
- [YOLO](https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data) (`bboxes`)
@@ -188,7 +188,7 @@ python -m virtualenv venv
188188
Install Datumaro package:
189189

190190
``` bash
191-
pip install 'git+https://github.com/openvinotoolkit/datumaro'
191+
pip install datumaro
192192
```
193193

194194
## Usage
@@ -234,13 +234,14 @@ dataset = dataset.transform(project.env.transforms.get('remap_labels'),
234234
{'cat': 'dog', # rename cat to dog
235235
'truck': 'car', # rename truck to car
236236
'person': '', # remove this label
237-
}, default='delete')
237+
}, default='delete') # remove everything else
238238

239+
# iterate over dataset elements
239240
for item in dataset:
240241
print(item.id, item.annotations)
241242

242243
# export the resulting dataset in COCO format
243-
project.env.converters.get('coco').convert(dataset, save_dir='dst/dir')
244+
dataset.export('dst/dir', 'coco')
244245
```
245246

246247
> Check our [developer guide](docs/developer_guide.md) for additional information.

docs/design.md

+10-7
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,11 @@ Datumaro is:
7373

7474
## RC 1 vision
7575

76-
In the first version Datumaro should be a project manager for CVAT.
77-
It should only consume data from CVAT. The collected dataset
78-
can be downloaded by user to be operated on with Datumaro CLI.
76+
*CVAT integration*
77+
78+
Datumaro needs to be integrated with [CVAT](https://github.com/openvinotoolkit/cvat),
79+
extending CVAT UI capabilities regarding task and project operations.
80+
It should be capable of downloading and processing data from CVAT.
7981

8082
<!--lint disable fenced-code-flag-->
8183
```
@@ -94,6 +96,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
9496

9597
- [x] Python API for user code
9698
- [x] Installation as a package
99+
- [x] Installation with `pip` by name
97100
- [x] A command-line tool for dataset manipulations
98101

99102
### Features
@@ -106,7 +109,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
106109
- [x] YOLO
107110
- [x] TF Detection API
108111
- [ ] Cityscapes
109-
- [ ] ImageNet
112+
- [x] ImageNet
110113

111114
- Dataset visualization (`show`)
112115
- [ ] Ability to visualize a dataset
@@ -117,7 +120,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
117120
- [x] Object counts (detection scenario)
118121
- [x] Image-Class distribution (classification scenario)
119122
- [x] Pixel-Class distribution (segmentation scenario)
120-
- [ ] Image similarity clusters
123+
- [x] Image similarity clusters
121124
- [ ] Custom statistics
122125

123126
- Dataset building
@@ -164,7 +167,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
164167
### Optional features
165168

166169
- Dataset publishing
167-
- [ ] Versioning (for annotations, subsets, sources, etc.)
170+
- [x] Versioning (for annotations, subsets, sources, etc.)
168171
- [ ] Blur sensitive areas on images
169172
- [ ] Tracking of legal information
170173
- [ ] Documentation generation
@@ -175,7 +178,7 @@ can be downloaded by user to be operated on with Datumaro CLI.
175178

176179
- Dataset and model debugging
177180
- [ ] Training visualization
178-
- [ ] Inference explanation (`explain`)
181+
- [x] Inference explanation (`explain`)
179182
- [ ] White-box approach
180183

181184
### Properties

docs/developer_guide.md

+42-25
Original file line numberDiff line numberDiff line change
@@ -38,28 +38,27 @@ Datumaro has a number of dataset and annotation features:
3838
- various annotation operations
3939

4040
```python
41-
from datumaro.components.project import Environment, Dataset
41+
from datumaro.components.dataset import Dataset
4242
from datumaro.components.extractor import Bbox, Polygon, DatasetItem
4343

44-
# Import and save a dataset
45-
env = Environment()
46-
dataset = env.make_importer('voc')('src/dir').make_dataset()
47-
env.converters.get('coco').convert(dataset, save_dir='dst/dir')
44+
# Import and export a dataset
45+
dataset = Dataset.import_from('src/dir', 'voc')
46+
dataset.export('dst/dir', 'coco')
4847

4948
# Create a dataset, convert polygons to masks, save in PASCAL VOC format
5049
dataset = Dataset.from_iterable([
51-
DatasetItem(id='image1', annotations=[
52-
Bbox(x=1, y=2, w=3, h=4, label=1),
53-
Polygon([1, 2, 3, 2, 4, 4], label=2, attributes={'occluded': True}),
54-
]),
50+
DatasetItem(id='image1', annotations=[
51+
Bbox(x=1, y=2, w=3, h=4, label=1),
52+
Polygon([1, 2, 3, 2, 4, 4], label=2, attributes={'occluded': True}),
53+
]),
5554
], categories=['cat', 'dog', 'person'])
56-
dataset = dataset.transform(env.transforms.get('polygons_to_masks'))
57-
env.converters.get('voc').convert(dataset, save_dir='dst/dir')
55+
dataset = dataset.transform('polygons_to_masks')
56+
dataset.export('dst/dir', 'voc')
5857
```
5958

6059
### The Dataset class
6160

62-
The `Dataset` class from the `datumaro.components.project` module represents
61+
The `Dataset` class from the `datumaro.components.dataset` module represents
6362
a dataset, consisting of multiple `DatasetItem`s. Annotations are
6463
represented by members of the `datumaro.components.extractor` module,
6564
such as `Label`, `Mask` or `Polygon`. A dataset can contain items from one or
@@ -80,16 +79,19 @@ The main operation for a dataset is iteration over its elements.
8079
An item corresponds to a single image, a video sequence, etc. There are also
8180
few other operations available, such as filtration (`dataset.select`) and
8281
transformations (`dataset.transform`). A dataset can be created from extractors
83-
or other datasets with `dataset.from_extractors` and directly from items with
84-
`dataset.from_iterable`. A dataset is an extractor itself. If it is created from
85-
multiple extractors, their categories must match, and their contents will be
86-
merged.
82+
or other datasets with `Dataset.from_extractors()` and directly from items with
83+
`Dataset.from_iterable()`. A dataset is an extractor itself. If it is created
84+
from multiple extractors, their categories must match, and their contents
85+
will be merged.
8786

8887
A dataset item is an element of a dataset. Its `id` is a name of a
8988
corresponding image. There can be some image `attributes`,
9089
an `image` and `annotations`.
9190

9291
```python
92+
from datumaro.components.dataset import Dataset
93+
from datumaro.components.extractor import Bbox, Polygon, DatasetItem
94+
9395
# create a dataset from other datasets
9496
dataset = Dataset.from_extractors(dataset1, dataset2)
9597

@@ -105,7 +107,7 @@ dataset = Dataset.from_iterable([
105107
dataset = dataset.select(lambda item: len(item.annotations) != 0)
106108

107109
# change dataset labels
108-
dataset = dataset.transform(project.env.transforms.get('remap_labels'),
110+
dataset = dataset.transform('remap_labels',
109111
{'cat': 'dog', # rename cat to dog
110112
'truck': 'car', # rename truck to car
111113
'person': '', # remove this label
@@ -116,8 +118,7 @@ for item in dataset:
116118
print(item.id, item.annotations)
117119

118120
# iterate over subsets
119-
for subset_name in dataset.subsets():
120-
subset = dataset.get_subset(subset_name) # a dataset, again
121+
for subset_name, subset in dataset.subsets().items():
121122
for item in subset:
122123
print(item.id, item.annotations)
123124
```
@@ -129,6 +130,7 @@ persistence, of extending, and CLI operation for Datasets. A project can
129130
be converted to a Dataset with `project.make_dataset`. Project datasets
130131
can have multiple data sources, which are merged on dataset creation. They
131132
can have a hierarchy. Project configuration is available in `project.config`.
133+
A dataset can be saved in `datumaro_project` format.
132134

133135
The `Environment` class is responsible for accessing built-in and
134136
project-specific plugins. For a project, there is an instance of
@@ -204,11 +206,12 @@ YoloConverter.convert(dataset, save_dir=dst_dir)
204206

205207
### Writing a plugin
206208

207-
A plugin is a Python module with any name, which exports some symbols.
208-
To export a symbol, inherit it from one of special classes:
209+
A plugin is a Python module with any name, which exports some symbols. Symbols,
210+
starting with `_` are not exported by default. To export a symbol,
211+
inherit it from one of the special classes:
209212

210213
```python
211-
from datumaro.components.extractor import Importer, SourceExtractor, Transform
214+
from datumaro.components.extractor import Importer, Extractor, Transform
212215
from datumaro.components.launcher import Launcher
213216
from datumaro.components.converter import Converter
214217
```
@@ -224,6 +227,19 @@ There is also an additional class to modify plugin appearance in command line:
224227

225228
```python
226229
from datumaro.components.cli_plugin import CliPlugin
230+
231+
class MyPlugin(Converter, CliPlugin):
232+
"""
233+
Optional documentation text, which will appear in command-line help
234+
"""
235+
236+
NAME = 'optional_custom_plugin_name'
237+
238+
def build_cmdline_parser(self, **kwargs):
239+
parser = super().build_cmdline_parser(**kwargs)
240+
# set up argparse.ArgumentParser instance
241+
# the parsed args are supposed to be used as invocation options
242+
return parser
227243
```
228244

229245
#### Plugin example
@@ -269,13 +285,14 @@ class MyTransform(Transform, CliPlugin):
269285
`my_plugin2.py` contents:
270286

271287
```python
272-
from datumaro.components.extractor import SourceExtractor
288+
from datumaro.components.extractor import Extractor
273289

274290
class MyFormat: ...
275-
class MyFormatExtractor(SourceExtractor): ...
291+
class _MyFormatConverter(Converter): ...
292+
class MyFormatExtractor(Extractor): ...
276293

277294
exports = [MyFormat] # explicit exports declaration
278-
# MyFormatExtractor won't be exported
295+
# MyFormatExtractor and _MyFormatConverter won't be exported
279296
```
280297

281298
## Command-line

docs/user_manual.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,15 @@ python -m virtualenv venv
4545

4646
Install:
4747
``` bash
48+
# From PyPI:
49+
pip install datumaro
50+
51+
# From the GitHub repository:
4852
pip install 'git+https://github.com/openvinotoolkit/datumaro'
4953
```
5054

5155
> You can change the installation branch with `...@<branch_name>`
52-
> Also note `--force-reinstall` parameter in this case.
56+
> Also use `--force-reinstall` parameter in this case.
5357
5458
## Interfaces
5559

0 commit comments

Comments
 (0)