Skip to content

Commit 8562368

Browse files
vertex-sdk-botcopybara-github
authored andcommitted
docs: Update the documentation for the image_dataset class
PiperOrigin-RevId: 583157369
1 parent 03f787c commit 8562368

File tree

2 files changed

+95
-64
lines changed

2 files changed

+95
-64
lines changed

google/cloud/aiplatform/datasets/image_dataset.py

+88-57
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,34 @@
2727

2828

2929
class ImageDataset(datasets._Dataset):
30-
"""Managed image dataset resource for Vertex AI."""
30+
"""A managed image dataset resource for Vertex AI.
31+
32+
Use this class to work with a managed image dataset. To create a managed
33+
image dataset, you need a datasource file in CSV format and a schema file in
34+
YAML format. A schema is optional for a custom model. You put the CSV file
35+
and the schema into Cloud Storage buckets.
36+
37+
Use image data for the following objectives:
38+
39+
* Single-label classification. For more information, see
40+
[Prepare image training data for single-label classification](https://cloud.google.com/vertex-ai/docs/image-data/classification/prepare-data#single-label-classification).
41+
* Multi-label classification. For more information, see [Prepare image training data for multi-label classification](https://cloud.google.com/vertex-ai/docs/image-data/classification/prepare-data#multi-label-classification).
42+
* Object detection. For more information, see [Prepare image training data
43+
for object detection](https://cloud.google.com/vertex-ai/docs/image-data/object-detection/prepare-data).
44+
45+
The following code shows you how to create an image dataset by importing data from
46+
a CSV datasource file and a YAML schema file. The schema file you use
47+
depends on whether your image dataset is used for single-label
48+
classification, multi-label classification, or object detection.
49+
50+
```py
51+
my_dataset = aiplatform.ImageDataset.create(
52+
display_name="my-image-dataset",
53+
gcs_source=['gs://path/to/my/image-dataset.csv'],
54+
import_schema_uri=['gs://path/to/my/schema.yaml']
55+
)
56+
```
57+
"""
3158

3259
_supported_metadata_schema_uris: Optional[Tuple[str]] = (
3360
schema.dataset.metadata.image,
@@ -49,84 +76,88 @@ def create(
4976
sync: bool = True,
5077
create_request_timeout: Optional[float] = None,
5178
) -> "ImageDataset":
52-
"""Creates a new image dataset and optionally imports data into dataset
53-
when source and import_schema_uri are passed.
79+
"""Creates a new image dataset.
80+
81+
Optionally imports data into the dataset when a source and
82+
`import_schema_uri` are passed in.
5483
5584
Args:
5685
display_name (str):
57-
Optional. The user-defined name of the Dataset.
58-
The name can be up to 128 characters long and can be consist
59-
of any UTF-8 characters.
86+
Optional. The user-defined name of the dataset. The name must
87+
contain 128 or fewer UTF-8 characters.
6088
gcs_source (Union[str, Sequence[str]]):
61-
Google Cloud Storage URI(-s) to the
62-
input file(s).
63-
64-
Examples:
65-
str: "gs://bucket/file.csv"
66-
Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"]
89+
Optional. The URI to one or more Google Cloud Storage buckets
90+
that contain your datasets. For example, `str:
91+
"gs://bucket/file.csv"` or `Sequence[str]:
92+
["gs://bucket/file1.csv", "gs://bucket/file2.csv"]`.
6793
import_schema_uri (str):
68-
Points to a YAML file stored on Google Cloud
69-
Storage describing the import format. Validation will be
70-
done against the schema. The schema is defined as an
71-
`OpenAPI 3.0.2 Schema
72-
Object <https://tinyurl.com/y538mdwt>`__.
94+
Optional. A URI for a YAML file stored in Cloud Storage that
95+
describes the import schema used to validate the
96+
dataset. The schema is an
97+
[OpenAPI 3.0.2 Schema](https://tinyurl.com/y538mdwt) object.
7398
data_item_labels (Dict):
74-
Labels that will be applied to newly imported DataItems. If
75-
an identical DataItem as one being imported already exists
76-
in the Dataset, then these labels will be appended to these
77-
of the already existing one, and if labels with identical
78-
key is imported before, the old label value will be
79-
overwritten. If two DataItems are identical in the same
80-
import data operation, the labels will be combined and if
81-
key collision happens in this case, one of the values will
82-
be picked randomly. Two DataItems are considered identical
83-
if their content bytes are identical (e.g. image bytes or
84-
pdf bytes). These labels will be overridden by Annotation
85-
labels specified inside index file referenced by
86-
``import_schema_uri``,
87-
e.g. jsonl file.
99+
Optional. A dictionary of label information. Each dictionary
100+
item contains a label and a label key. Each image in the dataset
101+
includes one dictionary of label information. If a data item is
102+
added or merged into a dataset, and that data item contains an
103+
image that's identical to an image that’s already in the
104+
dataset, then the data items are merged. If two identical labels
105+
are detected during the merge, each with a different label key,
106+
then one of the label and label key dictionary items is randomly
107+
chosen to be into the merged data item. Images and documents are
108+
compared using their binary data (bytes), not on their content.
109+
If annotation labels are referenced in a schema specified by the
110+
`import_schema_url` parameter, then the labels in the
111+
`data_item_labels` dictionary are overriden by the annotations.
88112
project (str):
89-
Project to upload this dataset to. Overrides project set in
90-
aiplatform.init.
113+
Optional. The name of the Google Cloud project to which this
114+
`ImageDataset` is uploaded. This overrides the project that
115+
was set by `aiplatform.init`.
91116
location (str):
92-
Location to upload this dataset to. Overrides location set in
93-
aiplatform.init.
117+
Optional. The Google Cloud region where this dataset is uploaded. This
118+
region overrides the region that was set by `aiplatform.init`.
94119
credentials (auth_credentials.Credentials):
95-
Custom credentials to use to upload this dataset. Overrides
96-
credentials set in aiplatform.init.
120+
Optional. The credentials that are used to upload the
121+
`ImageDataset`. These credentials override the credentials set
122+
by `aiplatform.init`.
97123
request_metadata (Sequence[Tuple[str, str]]):
98-
Strings which should be sent along with the request as metadata.
124+
Optional. Strings that contain metadata that's sent with the request.
99125
labels (Dict[str, str]):
100-
Optional. Labels with user-defined metadata to organize your Tensorboards.
101-
Label keys and values can be no longer than 64 characters
102-
(Unicode codepoints), can only contain lowercase letters, numeric
103-
characters, underscores and dashes. International characters are allowed.
104-
No more than 64 user labels can be associated with one Tensorboard
105-
(System labels are excluded).
106-
See https://goo.gl/xmQnxf for more information and examples of labels.
107-
System reserved label keys are prefixed with "aiplatform.googleapis.com/"
108-
and are immutable.
126+
Optional. Labels with user-defined metadata to organize your
127+
Vertex AI Tensorboards. The maximum length of a key and of a
128+
value is 64 unicode characters. Labels and keys can contain only
129+
lowercase letters, numeric characters, underscores, and dashes.
130+
International characters are allowed. No more than 64 user
131+
labels can be associated with one Tensorboard (system labels are
132+
excluded). For more information and examples of using labels, see
133+
[Using labels to organize Google Cloud Platform resources](https://goo.gl/xmQnxf).
134+
System reserved label keys are prefixed with
135+
`aiplatform.googleapis.com/` and are immutable.
109136
encryption_spec_key_name (Optional[str]):
110137
Optional. The Cloud KMS resource identifier of the customer
111-
managed encryption key used to protect the dataset. Has the
112-
form:
113-
``projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key``.
138+
managed encryption key that's used to protect the dataset. The
139+
format of the key is
140+
`projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
114141
The key needs to be in the same region as where the compute
115142
resource is created.
116143
117-
If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
144+
If `encryption_spec_key_name` is set, this image dataset and
145+
all of its sub-resources are secured by this key.
118146
119-
Overrides encryption_spec_key_name set in aiplatform.init.
147+
This `encryption_spec_key_name` overrides the
148+
`encryption_spec_key_name` set by `aiplatform.init`.
120149
sync (bool):
121-
Whether to execute this method synchronously. If False, this method
122-
will be executed in concurrent Future and any downstream object will
123-
be immediately returned and synced when the Future has completed.
150+
If `true`, the `create` method creates an image dataset
151+
synchronously. If `false`, the `create` method creates an image
152+
dataset asynchronously.
124153
create_request_timeout (float):
125-
Optional. The timeout for the create request in seconds.
154+
Optional. The number of seconds for the timeout of the create
155+
request.
126156
127157
Returns:
128158
image_dataset (ImageDataset):
129-
Instantiated representation of the managed image dataset resource.
159+
An instantiated representation of the managed `ImageDataset`
160+
resource.
130161
"""
131162
if not display_name:
132163
display_name = cls._generate_display_name()

google/cloud/aiplatform/datasets/tabular_dataset.py

+7-7
Original file line numberDiff line numberDiff line change
@@ -103,26 +103,26 @@ def create(
103103
Optional. The user-defined name of the dataset. The name must
104104
contain 128 or fewer UTF-8 characters.
105105
gcs_source (Union[str, Sequence[str]]):
106-
The URI to one or more Google Cloud Storage buckets that contain
106+
Optional. The URI to one or more Google Cloud Storage buckets that contain
107107
your datasets. For example, `str: "gs://bucket/file.csv"` or
108108
`Sequence[str]: ["gs://bucket/file1.csv",
109109
"gs://bucket/file2.csv"]`.
110110
bq_source (str):
111-
The URI to a BigQuery table that's used as an input source. For
111+
Optional. The URI to a BigQuery table that's used as an input source. For
112112
example, `bq://project.dataset.table_name`.
113113
project (str):
114-
The name of the Google Cloud project to which this
114+
Optional. The name of the Google Cloud project to which this
115115
`TabularDataset` is uploaded. This overrides the project that
116116
was set by `aiplatform.init`.
117117
location (str):
118-
The Google Cloud region where this dataset is uploaded. This
118+
Optional. The Google Cloud region where this dataset is uploaded. This
119119
region overrides the region that was set by `aiplatform.init`.
120120
credentials (auth_credentials.Credentials):
121-
The credentials that are used to upload the `TabularDataset`.
121+
Optional. The credentials that are used to upload the `TabularDataset`.
122122
These credentials override the credentials set by
123123
`aiplatform.init`.
124124
request_metadata (Sequence[Tuple[str, str]]):
125-
Strings that contain metadata that's sent with the request.
125+
Optional. Strings that contain metadata that's sent with the request.
126126
labels (Dict[str, str]):
127127
Optional. Labels with user-defined metadata to organize your
128128
Vertex AI Tensorboards. The maximum length of a key and of a
@@ -149,7 +149,7 @@ def create(
149149
`encryption_spec_key_name` set by `aiplatform.init`.
150150
sync (bool):
151151
If `true`, the `create` method creates a tabular dataset
152-
synchronously. If false, the `create` mdthod creates a tabular
152+
synchronously. If `false`, the `create` method creates a tabular
153153
dataset asynchronously.
154154
create_request_timeout (float):
155155
Optional. The number of seconds for the timeout of the create

0 commit comments

Comments
 (0)