27
27
28
28
29
29
class VideoDataset (datasets ._Dataset ):
30
- """Managed video dataset resource for Vertex AI."""
30
+ """A managed video dataset resource for Vertex AI.
31
+
32
+ Use this class to work with a managed video dataset. To create a video
33
+ dataset, you need a datasource in CSV format and a schema in YAML format.
34
+ The CSV file and the schema are accessed in Cloud Storage buckets.
35
+
36
+ Use video data for the following objectives:
37
+
38
+ Classification. For more information, see Classification schema files.
39
+ Action recognition. For more information, see Action recognition schema
40
+ files. Object tracking. For more information, see Object tracking schema
41
+ files. The following code shows you how to create and import a dataset to
42
+ train a video classification model. The schema file you use depends on
43
+ whether you use your video dataset for action classification, recognition,
44
+ or object tracking.
45
+
46
+ ```py
47
+ my_dataset = aiplatform.VideoDataset.create(
48
+ gcs_source=['gs://path/to/my/dataset.csv'],
49
+ import_schema_uri=['gs://aip.schema.dataset.ioformat.video.classification.yaml']
50
+ )
51
+ ```
52
+ """
31
53
32
54
_supported_metadata_schema_uris : Optional [Tuple [str ]] = (
33
55
schema .dataset .metadata .video ,
@@ -49,84 +71,95 @@ def create(
49
71
sync : bool = True ,
50
72
create_request_timeout : Optional [float ] = None ,
51
73
) -> "VideoDataset" :
52
- """Creates a new video dataset and optionally imports data into dataset
53
- when source and import_schema_uri are passed.
74
+ """Creates a new video dataset.
75
+
76
+ Optionally imports data into the dataset when a source and
77
+ `import_schema_uri` are passed in. The following is an example of how
78
+ this method is used:
79
+
80
+ ```py
81
+ my_dataset = aiplatform.VideoDataset.create(
82
+ gcs_source=['gs://path/to/my/dataset.csv'],
83
+ import_schema_uri=['gs://aip.schema.dataset.ioformat.video.classification.yaml']
84
+ )
85
+ ```
54
86
55
87
Args:
56
88
display_name (str):
57
- Optional. The user-defined name of the Dataset.
58
- The name can be up to 128 characters long and can be consist
59
- of any UTF-8 characters.
89
+ Optional. The user-defined name of the dataset. The name must
90
+ contain 128 or fewer UTF-8 characters.
60
91
gcs_source (Union[str, Sequence[str]]):
61
- Google Cloud Storage URI(-s) to the
62
- input file(s).
63
-
64
- Examples:
65
- str: "gs://bucket/file.csv"
66
- Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"]
92
+ The URI to one or more Google Cloud Storage buckets that contain
93
+ your datasets. For example, `str: "gs://bucket/file.csv"` or
94
+ `Sequence[str]: ["gs://bucket/file1.csv",
95
+ "gs://bucket/file2.csv"]`.
67
96
import_schema_uri (str):
68
- Points to a YAML file stored on Google Cloud
69
- Storage describing the import format. Validation will be
70
- done against the schema. The schema is defined as an
71
- `OpenAPI 3.0.2 Schema
72
- Object <https://tinyurl.com/y538mdwt>`__.
97
+ A URI for a YAML file stored in Cloud Storage that
98
+ describes the import schema used to validate the
99
+ dataset. The schema is an
100
+ [OpenAPI 3.0.2 Schema](https://tinyurl.com/y538mdwt) object.
73
101
data_item_labels (Dict):
74
- Labels that will be applied to newly imported DataItems. If
75
- an identical DataItem as one being imported already exists
76
- in the Dataset, then these labels will be appended to these
77
- of the already existing one, and if labels with identical
78
- key is imported before, the old label value will be
79
- overwritten. If two DataItems are identical in the same
80
- import data operation, the labels will be combined and if
81
- key collision happens in this case, one of the values will
82
- be picked randomly. Two DataItems are considered identical
83
- if their content bytes are identical (e.g. image bytes or
84
- pdf bytes). These labels will be overridden by Annotation
85
- labels specified inside index file referenced by
86
- ``import_schema_uri``,
87
- e.g. jsonl file.
102
+ Optional. A dictionary of label information. Each dictionary
103
+ item contains a label and a label key. Each item in the dataset
104
+ includes one dictionary of label information. If a data item is
105
+ added or merged into a dataset, and that data item contains an
106
+ image that's identical to an image that’s already in the
107
+ dataset, then the data items are merged. If two identical labels
108
+ are detected during the merge, each with a different label key,
109
+ then one of the label and label key dictionary items is randomly
110
+ chosen to be into the merged data item. Dataset items are
111
+ compared using their binary data (bytes), not on their content.
112
+ If annotation labels are referenced in a schema specified by the
113
+ `import_schema_url` parameter, then the labels in the
114
+ `data_item_labels` dictionary are overriden by the annotations.
88
115
project (str):
89
- Project to upload this dataset to. Overrides project set in
90
- aiplatform.init.
116
+ The name of the Google Cloud project to which this
117
+ `VideoDataset` is uploaded. This overrides the project that
118
+ was set by `aiplatform.init`.
91
119
location (str):
92
- Location to upload this dataset to. Overrides location set in
93
- aiplatform.init.
120
+ The Google Cloud region where this dataset is uploaded. This
121
+ region overrides the region that was set by ` aiplatform.init` .
94
122
credentials (auth_credentials.Credentials):
95
- Custom credentials to use to upload this dataset. Overrides
96
- credentials set in aiplatform.init.
123
+ The credentials that are used to upload the `VideoDataset`.
124
+ These credentials override the credentials set by
125
+ `aiplatform.init`.
97
126
request_metadata (Sequence[Tuple[str, str]]):
98
- Strings which should be sent along with the request as metadata .
127
+ Strings that contain metadata that's sent with the request.
99
128
labels (Dict[str, str]):
100
- Optional. Labels with user-defined metadata to organize your Tensorboards.
101
- Label keys and values can be no longer than 64 characters
102
- (Unicode codepoints), can only contain lowercase letters, numeric
103
- characters, underscores and dashes. International characters are allowed.
104
- No more than 64 user labels can be associated with one Tensorboard
105
- (System labels are excluded).
106
- See https://goo.gl/xmQnxf for more information and examples of labels.
107
- System reserved label keys are prefixed with "aiplatform.googleapis.com/"
108
- and are immutable.
129
+ Optional. Labels with user-defined metadata to organize your
130
+ Vertex AI Tensorboards. The maximum length of a key and of a
131
+ value is 64 unicode characters. Labels and keys can contain only
132
+ lowercase letters, numeric characters, underscores, and dashes.
133
+ International characters are allowed. No more than 64 user
134
+ labels can be associated with one Tensorboard (system labels are
135
+ excluded). For more information and examples of using labels, see
136
+ [Using labels to organize Google Cloud Platform resources](https://goo.gl/xmQnxf).
137
+ System reserved label keys are prefixed with
138
+ `aiplatform.googleapis.com/` and are immutable.
109
139
encryption_spec_key_name (Optional[str]):
110
140
Optional. The Cloud KMS resource identifier of the customer
111
- managed encryption key used to protect the dataset. Has the
112
- form:
113
- `` projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key` `.
141
+ managed encryption key that's used to protect the dataset. The
142
+ format of the key is
143
+ `projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
114
144
The key needs to be in the same region as where the compute
115
145
resource is created.
116
146
117
- If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
147
+ If `encryption_spec_key_name` is set, this `VideoDataset` and
148
+ all of its sub-resources are secured by this key.
118
149
119
- Overrides encryption_spec_key_name set in aiplatform.init.
120
- create_request_timeout (float):
121
- Optional. The timeout for the create request in seconds.
150
+ This `encryption_spec_key_name` overrides the
151
+ `encryption_spec_key_name` set by `aiplatform.init`.
122
152
sync (bool):
123
- Whether to execute this method synchronously. If False, this method
124
- will be executed in concurrent Future and any downstream object will
125
- be immediately returned and synced when the Future has completed.
126
-
153
+ If `true`, the `create` method creates a video dataset
154
+ synchronously. If `false`, the `create` mdthod creates a video
155
+ dataset asynchronously.
156
+ create_request_timeout (float):
157
+ Optional. The number of seconds for the timeout of the create
158
+ request.
127
159
Returns:
128
160
video_dataset (VideoDataset):
129
- Instantiated representation of the managed video dataset resource.
161
+ An instantiated representation of the managed
162
+ `VideoDataset` resource.
130
163
"""
131
164
if not display_name :
132
165
display_name = cls ._generate_display_name ()
0 commit comments