Skip to content

Commit d7a13fb

Browse files
author
Maxim Zhiltsov
committed
Merge branch 'develop' into zm/yolo-custom-subset-name
2 parents 52c7112 + be7bbcc commit d7a13fb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1589
-284
lines changed

.github/workflows/github_pages.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,17 @@ jobs:
3030
run: |
3131
npm ci
3232
33+
# The pip upgrade must be in a separate step, because otherwise bash will
34+
# remember where the system-installed pip was, and will use it in any following
35+
# commands instead of the newly-installed pip.
36+
- name: Upgrade pip
37+
run: |
38+
pip install --upgrade pip
39+
3340
- name: Build docs
3441
run: |
3542
pip install gitpython packaging toml Sphinx==4.2.0 sphinx-rtd-theme==1.0.0 sphinx-copybutton==0.4.0 \
36-
tensorflow openvino-dev[accuracy_check] sphinxcontrib-mermaid
43+
tensorflow openvino-dev sphinxcontrib-mermaid
3744
pip install -r requirements.txt
3845
pip install git+https://github.com/pytorch-ignite/sphinxcontrib-versioning.git@a1a1a94ca80a0233f0df3eaf9876812484901e76
3946
sphinx-versioning -l site/source/conf.py build -r develop -w develop site/source site/static/api

.github/workflows/linter.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ jobs:
5050
run: |
5151
pip install --user -r <(grep "^pylint" ./requirements.txt)
5252
echo "Pylint version: "`pylint --version | head -1`
53-
git ls-files -z '*.py' | xargs -0 pylint -j 0 -r n
53+
git ls-files -z '*.py' | xargs -0 pylint
5454
remark:
5555
runs-on: ubuntu-latest
5656
steps:

.pylintrc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ persistent=yes
2020
load-plugins=
2121

2222
# Use multiple processes to speed up Pylint.
23-
jobs=1
23+
jobs=0
2424

2525
# Allow loading of arbitrary C extensions. Extensions are imported into the
2626
# active Python interpreter and may run arbitrary code.
@@ -173,7 +173,7 @@ enable=
173173
output-format=text
174174

175175
# Tells whether to display a full report or only the messages
176-
reports=yes
176+
reports=no
177177

178178
# Python expression which should return a note less than 10 (10 is the highest
179179
# note). You have access to the variables errors warning, statement which

CHANGELOG.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
(<https://github.com/openvinotoolkit/datumaro/pull/539>)
1616
- BraTS format (import-only) (.npy and .nii.gz), new `MultiframeImage`
1717
media type (<https://github.com/openvinotoolkit/datumaro/pull/628>)
18+
- Common Semantic Segmentation dataset format (import-only)
19+
(<https://github.com/openvinotoolkit/datumaro/pull/685>)
20+
- An option to disable `data/` prefix inclusion in YOLO export
21+
(<https://github.com/openvinotoolkit/datumaro/pull/689>)
22+
- New command `describe-downloads` to print information about downloadable datasets
23+
(<https://github.com/openvinotoolkit/datumaro/pull/678>)
24+
- Detection for Cityscapes format
25+
(<https://github.com/openvinotoolkit/datumaro/pull/680>)
26+
- Maximum recursion `--depth` parameter for `detect-dataset` CLI command
27+
(<https://github.com/openvinotoolkit/datumaro/pull/680>)
28+
- An option to save a single subset in the `download` command
29+
(<https://github.com/openvinotoolkit/datumaro/pull/697>)
1830

1931
### Changed
32+
- `env.detect_dataset()` now returns a list of detected formats at all recursion levels
33+
instead of just the lowest one
34+
(<https://github.com/openvinotoolkit/datumaro/pull/680>)
35+
- Open Images: allowed to store annotations file in root path as well
36+
(<https://github.com/openvinotoolkit/datumaro/pull/680>)
37+
- Improved parsing error messages in COCO, VOC and YOLO formats
38+
(<https://github.com/openvinotoolkit/datumaro/pull/684>,
39+
<https://github.com/openvinotoolkit/datumaro/pull/686>,
40+
<https://github.com/openvinotoolkit/datumaro/pull/687>)
2041
- YOLO format now supports almost any subset names, except of
2142
just `train` and `valid`
2243
(<https://github.com/openvinotoolkit/datumaro/pull/688>)
@@ -32,7 +53,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3253
- TBD
3354

3455
### Fixed
35-
- TBD
56+
- Detection for LFW format
57+
(<https://github.com/openvinotoolkit/datumaro/pull/680>)
3658

3759
### Security
3860
- TBD

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ CVAT annotations ---> Publication, statistics etc.
2020
- [Examples](https://openvinotoolkit.github.io/datumaro/docs/getting_started/#examples)
2121
- [Features](#features)
2222
- [User manual](https://openvinotoolkit.github.io/datumaro/docs/user-manual)
23+
- [Developer manual](https://openvinotoolkit.github.io/datumaro/api)
2324
- [Contributing](#contributing)
2425

2526
## Features

datumaro/cli/__main__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,11 @@ def _get_known_commands():
9393
("", None, ""),
9494
("Dataset operations:", None, ""),
9595
("convert", commands.convert, "Convert dataset between formats"),
96+
(
97+
"describe-downloads",
98+
commands.describe_downloads,
99+
"Print information about downloadable datasets",
100+
),
96101
("detect-format", commands.detect_format, "Detect the format of a dataset"),
97102
("diff", commands.diff, "Compare datasets"),
98103
("download", commands.download, "Download a publicly available dataset"),

datumaro/cli/commands/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
commit,
1111
convert,
1212
create,
13+
describe_downloads,
1314
detect_format,
1415
diff,
1516
download,
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Copyright (C) 2022 Intel Corporation
2+
#
3+
# SPDX-License-Identifier: MIT
4+
5+
import argparse
6+
import contextlib
7+
import sys
8+
from typing import Dict, Type
9+
10+
from datumaro.components.extractor_tfds import (
11+
AVAILABLE_TFDS_DATASETS,
12+
TFDS_EXTRACTOR_AVAILABLE,
13+
TfdsDatasetRemoteMetadata,
14+
)
15+
from datumaro.util import dump_json
16+
17+
from ..util import MultilineFormatter
18+
19+
20+
def build_parser(
21+
parser_ctor: Type[argparse.ArgumentParser] = argparse.ArgumentParser,
22+
):
23+
parser = parser_ctor(
24+
help="Print information about downloadable datasets",
25+
description="""
26+
Reports information about datasets that can be downloaded with the
27+
"datum download" command. The information is reported either as
28+
human-readable text (the default) or as a JSON object.
29+
""",
30+
formatter_class=MultilineFormatter,
31+
)
32+
33+
parser.add_argument(
34+
"--report-format",
35+
choices=("text", "json"),
36+
default="text",
37+
help="Format in which to report the information (default: %(default)s)",
38+
)
39+
parser.add_argument(
40+
"--report-file", help="File to which to write the report (default: standard output)"
41+
)
42+
parser.set_defaults(command=describe_downloads_command)
43+
44+
return parser
45+
46+
47+
def get_sensitive_args():
48+
return {
49+
describe_downloads_command: ["report-file"],
50+
}
51+
52+
53+
def describe_downloads_command(args):
54+
dataset_metas: Dict[str, TfdsDatasetRemoteMetadata] = {}
55+
56+
if TFDS_EXTRACTOR_AVAILABLE:
57+
for dataset_name, dataset in AVAILABLE_TFDS_DATASETS.items():
58+
dataset_metas[f"tfds:{dataset_name}"] = dataset.query_remote_metadata()
59+
60+
if args.report_format == "text":
61+
with (
62+
open(args.report_file, "w") if args.report_file else contextlib.nullcontext(sys.stdout)
63+
) as report_file:
64+
if dataset_metas:
65+
print("Available datasets:", file=report_file)
66+
67+
for name, meta in sorted(dataset_metas.items()):
68+
print(file=report_file)
69+
print(f"{name} ({meta.human_name}):", file=report_file)
70+
print(
71+
f" default output format: {meta.default_output_format}",
72+
file=report_file,
73+
)
74+
75+
print(" description:", file=report_file)
76+
for line in meta.description.rstrip("\n").split("\n"):
77+
print(f" {line}", file=report_file)
78+
79+
print(f" download size: {meta.download_size} bytes", file=report_file)
80+
print(f" home URL: {meta.home_url or 'N/A'}", file=report_file)
81+
print(f" number of classes: {meta.num_classes}", file=report_file)
82+
print(" subsets:", file=report_file)
83+
for subset_name, subset_meta in sorted(meta.subsets.items()):
84+
print(f" {subset_name}: {subset_meta.num_items} items", file=report_file)
85+
print(f" version: {meta.version}", file=report_file)
86+
else:
87+
print("No datasets available.", file=report_file)
88+
print(file=report_file)
89+
print(
90+
"You can enable TFDS datasets by installing "
91+
"TensorFlow and TensorFlow Datasets:",
92+
file=report_file,
93+
)
94+
print(" pip install datumaro[tf,tfds]", file=report_file)
95+
96+
elif args.report_format == "json":
97+
98+
def meta_to_raw(meta: TfdsDatasetRemoteMetadata):
99+
raw = {}
100+
101+
# We omit the media type from the output, because there is currently no mechanism
102+
# for mapping media types to strings. The media type could be useful information
103+
# for users, though, so we might want to implement such a mechanism eventually.
104+
105+
for attribute in (
106+
"default_output_format",
107+
"description",
108+
"download_size",
109+
"home_url",
110+
"human_name",
111+
"num_classes",
112+
"version",
113+
):
114+
raw[attribute] = getattr(meta, attribute)
115+
116+
raw["subsets"] = {
117+
name: {"num_items": subset.num_items} for name, subset in meta.subsets.items()
118+
}
119+
120+
return raw
121+
122+
with (
123+
open(args.report_file, "wb")
124+
if args.report_file
125+
else contextlib.nullcontext(sys.stdout.buffer)
126+
) as report_file:
127+
report_file.write(
128+
dump_json(
129+
{name: meta_to_raw(meta) for name, meta in dataset_metas.items()},
130+
indent=True,
131+
append_newline=True,
132+
)
133+
)
134+
else:
135+
assert False, "unreachable code"

datumaro/cli/commands/detect_format.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from datumaro.cli.util.project import load_project
99
from datumaro.components.environment import Environment
1010
from datumaro.components.errors import ProjectNotFoundError
11-
from datumaro.components.format_detection import RejectionReason, detect_dataset_format
11+
from datumaro.components.format_detection import RejectionReason
1212
from datumaro.util import dump_json_file
1313
from datumaro.util.scope import scope_add, scoped
1414

@@ -53,6 +53,7 @@ def build_parser(parser_ctor=argparse.ArgumentParser):
5353
help="Path to which to save a JSON report describing detected "
5454
"and rejected formats. By default, no report is saved.",
5555
)
56+
parser.add_argument("--depth", help="The maximum depth for recursive search (default: 2) ")
5657
parser.set_defaults(command=detect_format_command)
5758

5859
return parser
@@ -90,10 +91,9 @@ def rejection_callback(
9091
"message": human_message,
9192
}
9293

93-
detected_formats = detect_dataset_format(
94-
((format_name, importer.detect) for format_name, importer in env.importers.items.items()),
95-
args.url,
96-
rejection_callback=rejection_callback,
94+
depth = 2 if not args.depth else int(args.depth)
95+
detected_formats = env.detect_dataset(
96+
args.url, rejection_callback=rejection_callback, depth=depth
9797
)
9898
report["detected_formats"] = detected_formats
9999

datumaro/cli/commands/download.py

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (C) 2021 Intel Corporation
1+
# Copyright (C) 2021-2022 Intel Corporation
22
#
33
# SPDX-License-Identifier: MIT
44

@@ -7,11 +7,7 @@
77
import os
88
import os.path as osp
99

10-
from datumaro.components.extractor_tfds import (
11-
AVAILABLE_TFDS_DATASETS,
12-
TFDS_EXTRACTOR_AVAILABLE,
13-
make_tfds_extractor,
14-
)
10+
from datumaro.components.extractor_tfds import AVAILABLE_TFDS_DATASETS, TFDS_EXTRACTOR_AVAILABLE
1511
from datumaro.components.project import Environment
1612
from datumaro.util.os_util import make_file_name
1713

@@ -40,7 +36,8 @@ def build_parser(parser_ctor=argparse.ArgumentParser):
4036
|n
4137
Supported datasets: {}|n
4238
|n
43-
For information about the datasets, see the TFDS Catalog:
39+
For information about the datasets, run "datum describe-downloads".
40+
More detailed information can be found in the TFDS Catalog:
4441
<https://www.tensorflow.org/datasets/catalog/overview>.|n
4542
|n
4643
Supported output formats: {}|n
@@ -71,6 +68,7 @@ def build_parser(parser_ctor=argparse.ArgumentParser):
7168
parser.add_argument(
7269
"--overwrite", action="store_true", help="Overwrite existing files in the save directory"
7370
)
71+
parser.add_argument("-s", "--subset", help="Save only the specified subset")
7472
parser.add_argument(
7573
"extra_args",
7674
nargs=argparse.REMAINDER,
@@ -94,10 +92,10 @@ def download_command(args):
9492
if args.dataset_id.startswith("tfds:"):
9593
if TFDS_EXTRACTOR_AVAILABLE:
9694
tfds_ds_name = args.dataset_id[5:]
97-
tfds_ds_metadata = AVAILABLE_TFDS_DATASETS.get(tfds_ds_name)
98-
if tfds_ds_metadata:
99-
default_converter_name = tfds_ds_metadata.default_converter_name
100-
extractor_factory = lambda: make_tfds_extractor(tfds_ds_name)
95+
tfds_ds = AVAILABLE_TFDS_DATASETS.get(tfds_ds_name)
96+
if tfds_ds:
97+
default_output_format = tfds_ds.metadata.default_output_format
98+
extractor_factory = tfds_ds.make_extractor
10199
else:
102100
raise CliException(f"Unsupported TFDS dataset '{tfds_ds_name}'")
103101
else:
@@ -109,7 +107,7 @@ def download_command(args):
109107
else:
110108
raise CliException(f"Unknown dataset ID '{args.dataset_id}'")
111109

112-
output_format = args.output_format or default_converter_name
110+
output_format = args.output_format or default_output_format
113111

114112
try:
115113
converter = env.converters[output_format]
@@ -136,6 +134,12 @@ def download_command(args):
136134
log.info("Downloading the dataset")
137135
extractor = extractor_factory()
138136

137+
if args.subset:
138+
try:
139+
extractor = extractor.subsets()[args.subset]
140+
except KeyError:
141+
raise CliException("Subset '%s' is not present in the dataset" % args.subset)
142+
139143
log.info("Exporting the dataset")
140144
converter.convert(extractor, dst_dir, default_image_ext=".png", **extra_args)
141145

0 commit comments

Comments
 (0)