27
27
28
28
29
29
class ImageDataset (datasets ._Dataset ):
30
- """Managed image dataset resource for Vertex AI."""
30
+ """A managed image dataset resource for Vertex AI.
31
+
32
+ Use this class to work with a managed image dataset. To create a managed
33
+ image dataset, you need a datasource file in CSV format and a schema file in
34
+ YAML format. A schema is optional for a custom model. You put the CSV file
35
+ and the schema into Cloud Storage buckets.
36
+
37
+ Use image data for the following objectives:
38
+
39
+ * Single-label classification. For more information, see
40
+ [Prepare image training data for single-label classification](https://cloud.google.com/vertex-ai/docs/image-data/classification/prepare-data#single-label-classification).
41
+ * Multi-label classification. For more information, see [Prepare image training data for multi-label classification](https://cloud.google.com/vertex-ai/docs/image-data/classification/prepare-data#multi-label-classification).
42
+ * Object detection. For more information, see [Prepare image training data
43
+ for object detection](https://cloud.google.com/vertex-ai/docs/image-data/object-detection/prepare-data).
44
+
45
+ The following code shows you how to create an image dataset by importing data from
46
+ a CSV datasource file and a YAML schema file. The schema file you use
47
+ depends on whether your image dataset is used for single-label
48
+ classification, multi-label classification, or object detection.
49
+
50
+ ```py
51
+ my_dataset = aiplatform.ImageDataset.create(
52
+ display_name="my-image-dataset",
53
+ gcs_source=['gs://path/to/my/image-dataset.csv'],
54
+ import_schema_uri=['gs://path/to/my/schema.yaml']
55
+ )
56
+ ```
57
+ """
31
58
32
59
_supported_metadata_schema_uris : Optional [Tuple [str ]] = (
33
60
schema .dataset .metadata .image ,
@@ -49,84 +76,88 @@ def create(
49
76
sync : bool = True ,
50
77
create_request_timeout : Optional [float ] = None ,
51
78
) -> "ImageDataset" :
52
- """Creates a new image dataset and optionally imports data into dataset
53
- when source and import_schema_uri are passed.
79
+ """Creates a new image dataset.
80
+
81
+ Optionally imports data into the dataset when a source and
82
+ `import_schema_uri` are passed in.
54
83
55
84
Args:
56
85
display_name (str):
57
- Optional. The user-defined name of the Dataset.
58
- The name can be up to 128 characters long and can be consist
59
- of any UTF-8 characters.
86
+ Optional. The user-defined name of the dataset. The name must
87
+ contain 128 or fewer UTF-8 characters.
60
88
gcs_source (Union[str, Sequence[str]]):
61
- Google Cloud Storage URI(-s) to the
62
- input file(s).
63
-
64
- Examples:
65
- str: "gs://bucket/file.csv"
66
- Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"]
89
+ Optional. The URI to one or more Google Cloud Storage buckets
90
+ that contain your datasets. For example, `str:
91
+ "gs://bucket/file.csv"` or `Sequence[str]:
92
+ ["gs://bucket/file1.csv", "gs://bucket/file2.csv"]`.
67
93
import_schema_uri (str):
68
- Points to a YAML file stored on Google Cloud
69
- Storage describing the import format. Validation will be
70
- done against the schema. The schema is defined as an
71
- `OpenAPI 3.0.2 Schema
72
- Object <https://tinyurl.com/y538mdwt>`__.
94
+ Optional. A URI for a YAML file stored in Cloud Storage that
95
+ describes the import schema used to validate the
96
+ dataset. The schema is an
97
+ [OpenAPI 3.0.2 Schema](https://tinyurl.com/y538mdwt) object.
73
98
data_item_labels (Dict):
74
- Labels that will be applied to newly imported DataItems. If
75
- an identical DataItem as one being imported already exists
76
- in the Dataset, then these labels will be appended to these
77
- of the already existing one, and if labels with identical
78
- key is imported before, the old label value will be
79
- overwritten. If two DataItems are identical in the same
80
- import data operation, the labels will be combined and if
81
- key collision happens in this case, one of the values will
82
- be picked randomly. Two DataItems are considered identical
83
- if their content bytes are identical (e.g. image bytes or
84
- pdf bytes). These labels will be overridden by Annotation
85
- labels specified inside index file referenced by
86
- ``import_schema_uri``,
87
- e.g. jsonl file.
99
+ Optional. A dictionary of label information. Each dictionary
100
+ item contains a label and a label key. Each image in the dataset
101
+ includes one dictionary of label information. If a data item is
102
+ added or merged into a dataset, and that data item contains an
103
+ image that's identical to an image that’s already in the
104
+ dataset, then the data items are merged. If two identical labels
105
+ are detected during the merge, each with a different label key,
106
+ then one of the label and label key dictionary items is randomly
107
+ chosen to be into the merged data item. Images and documents are
108
+ compared using their binary data (bytes), not on their content.
109
+ If annotation labels are referenced in a schema specified by the
110
+ `import_schema_url` parameter, then the labels in the
111
+ `data_item_labels` dictionary are overriden by the annotations.
88
112
project (str):
89
- Project to upload this dataset to. Overrides project set in
90
- aiplatform.init.
113
+ Optional. The name of the Google Cloud project to which this
114
+ `ImageDataset` is uploaded. This overrides the project that
115
+ was set by `aiplatform.init`.
91
116
location (str):
92
- Location to upload this dataset to. Overrides location set in
93
- aiplatform.init.
117
+ Optional. The Google Cloud region where this dataset is uploaded. This
118
+ region overrides the region that was set by ` aiplatform.init` .
94
119
credentials (auth_credentials.Credentials):
95
- Custom credentials to use to upload this dataset. Overrides
96
- credentials set in aiplatform.init.
120
+ Optional. The credentials that are used to upload the
121
+ `ImageDataset`. These credentials override the credentials set
122
+ by `aiplatform.init`.
97
123
request_metadata (Sequence[Tuple[str, str]]):
98
- Strings which should be sent along with the request as metadata .
124
+ Optional. Strings that contain metadata that's sent with the request.
99
125
labels (Dict[str, str]):
100
- Optional. Labels with user-defined metadata to organize your Tensorboards.
101
- Label keys and values can be no longer than 64 characters
102
- (Unicode codepoints), can only contain lowercase letters, numeric
103
- characters, underscores and dashes. International characters are allowed.
104
- No more than 64 user labels can be associated with one Tensorboard
105
- (System labels are excluded).
106
- See https://goo.gl/xmQnxf for more information and examples of labels.
107
- System reserved label keys are prefixed with "aiplatform.googleapis.com/"
108
- and are immutable.
126
+ Optional. Labels with user-defined metadata to organize your
127
+ Vertex AI Tensorboards. The maximum length of a key and of a
128
+ value is 64 unicode characters. Labels and keys can contain only
129
+ lowercase letters, numeric characters, underscores, and dashes.
130
+ International characters are allowed. No more than 64 user
131
+ labels can be associated with one Tensorboard (system labels are
132
+ excluded). For more information and examples of using labels, see
133
+ [Using labels to organize Google Cloud Platform resources](https://goo.gl/xmQnxf).
134
+ System reserved label keys are prefixed with
135
+ `aiplatform.googleapis.com/` and are immutable.
109
136
encryption_spec_key_name (Optional[str]):
110
137
Optional. The Cloud KMS resource identifier of the customer
111
- managed encryption key used to protect the dataset. Has the
112
- form:
113
- `` projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key` `.
138
+ managed encryption key that's used to protect the dataset. The
139
+ format of the key is
140
+ `projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
114
141
The key needs to be in the same region as where the compute
115
142
resource is created.
116
143
117
- If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
144
+ If `encryption_spec_key_name` is set, this image dataset and
145
+ all of its sub-resources are secured by this key.
118
146
119
- Overrides encryption_spec_key_name set in aiplatform.init.
147
+ This `encryption_spec_key_name` overrides the
148
+ `encryption_spec_key_name` set by `aiplatform.init`.
120
149
sync (bool):
121
- Whether to execute this method synchronously. If False, this method
122
- will be executed in concurrent Future and any downstream object will
123
- be immediately returned and synced when the Future has completed .
150
+ If `true`, the `create` method creates an image dataset
151
+ synchronously. If `false`, the `create` method creates an image
152
+ dataset asynchronously .
124
153
create_request_timeout (float):
125
- Optional. The timeout for the create request in seconds.
154
+ Optional. The number of seconds for the timeout of the create
155
+ request.
126
156
127
157
Returns:
128
158
image_dataset (ImageDataset):
129
- Instantiated representation of the managed image dataset resource.
159
+ An instantiated representation of the managed `ImageDataset`
160
+ resource.
130
161
"""
131
162
if not display_name :
132
163
display_name = cls ._generate_display_name ()
0 commit comments