Skip to content

Commit d5dc7b5

Browse files
vertex-sdk-botcopybara-github
authored andcommitted
docs: Update the documentation for the time_series_dataset and video_dataset classes
PiperOrigin-RevId: 642703477
1 parent eb651bc commit d5dc7b5

File tree

2 files changed

+166
-99
lines changed

2 files changed

+166
-99
lines changed

google/cloud/aiplatform/datasets/time_series_dataset.py

+74-40
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,38 @@
2727

2828

2929
class TimeSeriesDataset(datasets._ColumnNamesDataset):
30-
"""Managed time series dataset resource for Vertex AI"""
30+
"""A managed time series dataset resource for Vertex AI.
31+
32+
Use this class to work with time series datasets. A time series is a dataset
33+
that contains data recorded at different time intervals. The dataset
34+
includes time and at least one variable that's dependent on time. You use a
35+
time series dataset for forecasting predictions. For more information, see
36+
[Forecasting overview](https://cloud.google.com/vertex-ai/docs/tabular-data/forecasting/overview).
37+
38+
You can create a managed time series dataset from CSV files in a Cloud
39+
Storage bucket or from a BigQuery table.
40+
41+
The following code shows you how to create a `TimeSeriesDataset` with a CSV
42+
file that has the time series dataset:
43+
44+
```py
45+
my_dataset = aiplatform.TimeSeriesDataset.create(
46+
display_name="my-dataset",
47+
gcs_source=['gs://path/to/my/dataset.csv'],
48+
)
49+
```
50+
51+
The following code shows you how to create with a `TimeSeriesDataset` with a
52+
BigQuery table file that has the time series dataset:
53+
54+
```py
55+
my_dataset = aiplatform.TimeSeriesDataset.create(
56+
display_name="my-dataset",
57+
bq_source=['bq://path/to/my/bigquerydataset.train'],
58+
)
59+
```
60+
61+
"""
3162

3263
_supported_metadata_schema_uris: Optional[Tuple[str]] = (
3364
schema.dataset.metadata.time_series,
@@ -52,62 +83,65 @@ def create(
5283
5384
Args:
5485
display_name (str):
55-
Optional. The user-defined name of the Dataset.
56-
The name can be up to 128 characters long and can be consist
57-
of any UTF-8 characters.
86+
Optional. The user-defined name of the dataset. The name must
87+
contain 128 or fewer UTF-8 characters.
5888
gcs_source (Union[str, Sequence[str]]):
59-
Google Cloud Storage URI(-s) to the
60-
input file(s).
61-
62-
Examples:
63-
str: "gs://bucket/file.csv"
64-
Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"]
89+
The URI to one or more Google Cloud Storage buckets that contain
90+
your datasets. For example, `str: "gs://bucket/file.csv"` or
91+
`Sequence[str]: ["gs://bucket/file1.csv",
92+
"gs://bucket/file2.csv"]`.
6593
bq_source (str):
66-
BigQuery URI to the input table.
67-
example:
68-
"bq://project.dataset.table_name"
94+
A BigQuery URI for the input table. For example,
95+
`bq://project.dataset.table_name`.
6996
project (str):
70-
Project to upload this dataset to. Overrides project set in
71-
aiplatform.init.
97+
The name of the Google Cloud project to which this
98+
`TimeSeriesDataset` is uploaded. This overrides the project that
99+
was set by `aiplatform.init`.
72100
location (str):
73-
Location to upload this dataset to. Overrides location set in
74-
aiplatform.init.
101+
The Google Cloud region where this dataset is uploaded. This
102+
region overrides the region that was set by `aiplatform.init`.
75103
credentials (auth_credentials.Credentials):
76-
Custom credentials to use to upload this dataset. Overrides
77-
credentials set in aiplatform.init.
104+
The credentials that are used to upload the `TimeSeriesDataset`.
105+
These credentials override the credentials set by
106+
`aiplatform.init`.
78107
request_metadata (Sequence[Tuple[str, str]]):
79-
Strings which should be sent along with the request as metadata.
108+
Strings that contain metadata that's sent with the request.
80109
labels (Dict[str, str]):
81-
Optional. Labels with user-defined metadata to organize your datasets.
82-
Label keys and values can be no longer than 64 characters
83-
(Unicode codepoints), can only contain lowercase letters, numeric
84-
characters, underscores and dashes. International characters are allowed.
85-
No more than 64 user labels can be associated with one TimeSeriesDataset
86-
(System labels are excluded).
87-
See https://goo.gl/xmQnxf for more information and examples of labels.
88-
System reserved label keys are prefixed with "aiplatform.googleapis.com/"
89-
and are immutable.
110+
Optional. Labels with user-defined metadata to organize your
111+
Vertex AI Tensorboards. The maximum length of a key and of a
112+
value is 64 unicode characters. Labels and keys can contain only
113+
lowercase letters, numeric characters, underscores, and dashes.
114+
International characters are allowed. No more than 64 user
115+
labels can be associated with one Tensorboard (system labels are
116+
excluded). For more information and examples of using labels, see
117+
[Using labels to organize Google Cloud Platform resources](https://goo.gl/xmQnxf).
118+
System reserved label keys are prefixed with
119+
`aiplatform.googleapis.com/` and are immutable.
90120
encryption_spec_key_name (Optional[str]):
91121
Optional. The Cloud KMS resource identifier of the customer
92-
managed encryption key used to protect the dataset. Has the
93-
form:
94-
``projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key``.
122+
managed encryption key that's used to protect the dataset. The
123+
format of the key is
124+
`projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
95125
The key needs to be in the same region as where the compute
96126
resource is created.
97127
98-
If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
128+
If `encryption_spec_key_name` is set, this time series dataset
129+
and all of its sub-resources are secured by this key.
99130
100-
Overrides encryption_spec_key_name set in aiplatform.init.
101-
sync (bool):
102-
Whether to execute this method synchronously. If False, this method
103-
will be executed in concurrent Future and any downstream object will
104-
be immediately returned and synced when the Future has completed.
131+
This `encryption_spec_key_name` overrides the
132+
`encryption_spec_key_name` set by `aiplatform.init`.
105133
create_request_timeout (float):
106-
Optional. The timeout for the create request in seconds.
134+
Optional. The number of seconds for the timeout of the create
135+
request.
136+
sync (bool):
137+
If `true`, the `create` method creates a time series dataset
138+
synchronously. If `false`, the `create` method creates a time
139+
series dataset asynchronously.
107140
108141
Returns:
109142
time_series_dataset (TimeSeriesDataset):
110-
Instantiated representation of the managed time series dataset resource.
143+
An instantiated representation of the managed
144+
`TimeSeriesDataset` resource.
111145
112146
"""
113147
if not display_name:

google/cloud/aiplatform/datasets/video_dataset.py

+92-59
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,29 @@
2727

2828

2929
class VideoDataset(datasets._Dataset):
30-
"""Managed video dataset resource for Vertex AI."""
30+
"""A managed video dataset resource for Vertex AI.
31+
32+
Use this class to work with a managed video dataset. To create a video
33+
dataset, you need a datasource in CSV format and a schema in YAML format.
34+
The CSV file and the schema are accessed in Cloud Storage buckets.
35+
36+
Use video data for the following objectives:
37+
38+
Classification. For more information, see Classification schema files.
39+
Action recognition. For more information, see Action recognition schema
40+
files. Object tracking. For more information, see Object tracking schema
41+
files. The following code shows you how to create and import a dataset to
42+
train a video classification model. The schema file you use depends on
43+
whether you use your video dataset for action classification, recognition,
44+
or object tracking.
45+
46+
```py
47+
my_dataset = aiplatform.VideoDataset.create(
48+
gcs_source=['gs://path/to/my/dataset.csv'],
49+
import_schema_uri=['gs://aip.schema.dataset.ioformat.video.classification.yaml']
50+
)
51+
```
52+
"""
3153

3254
_supported_metadata_schema_uris: Optional[Tuple[str]] = (
3355
schema.dataset.metadata.video,
@@ -49,84 +71,95 @@ def create(
4971
sync: bool = True,
5072
create_request_timeout: Optional[float] = None,
5173
) -> "VideoDataset":
52-
"""Creates a new video dataset and optionally imports data into dataset
53-
when source and import_schema_uri are passed.
74+
"""Creates a new video dataset.
75+
76+
Optionally imports data into the dataset when a source and
77+
`import_schema_uri` are passed in. The following is an example of how
78+
this method is used:
79+
80+
```py
81+
my_dataset = aiplatform.VideoDataset.create(
82+
gcs_source=['gs://path/to/my/dataset.csv'],
83+
import_schema_uri=['gs://aip.schema.dataset.ioformat.video.classification.yaml']
84+
)
85+
```
5486
5587
Args:
5688
display_name (str):
57-
Optional. The user-defined name of the Dataset.
58-
The name can be up to 128 characters long and can be consist
59-
of any UTF-8 characters.
89+
Optional. The user-defined name of the dataset. The name must
90+
contain 128 or fewer UTF-8 characters.
6091
gcs_source (Union[str, Sequence[str]]):
61-
Google Cloud Storage URI(-s) to the
62-
input file(s).
63-
64-
Examples:
65-
str: "gs://bucket/file.csv"
66-
Sequence[str]: ["gs://bucket/file1.csv", "gs://bucket/file2.csv"]
92+
The URI to one or more Google Cloud Storage buckets that contain
93+
your datasets. For example, `str: "gs://bucket/file.csv"` or
94+
`Sequence[str]: ["gs://bucket/file1.csv",
95+
"gs://bucket/file2.csv"]`.
6796
import_schema_uri (str):
68-
Points to a YAML file stored on Google Cloud
69-
Storage describing the import format. Validation will be
70-
done against the schema. The schema is defined as an
71-
`OpenAPI 3.0.2 Schema
72-
Object <https://tinyurl.com/y538mdwt>`__.
97+
A URI for a YAML file stored in Cloud Storage that
98+
describes the import schema used to validate the
99+
dataset. The schema is an
100+
[OpenAPI 3.0.2 Schema](https://tinyurl.com/y538mdwt) object.
73101
data_item_labels (Dict):
74-
Labels that will be applied to newly imported DataItems. If
75-
an identical DataItem as one being imported already exists
76-
in the Dataset, then these labels will be appended to these
77-
of the already existing one, and if labels with identical
78-
key is imported before, the old label value will be
79-
overwritten. If two DataItems are identical in the same
80-
import data operation, the labels will be combined and if
81-
key collision happens in this case, one of the values will
82-
be picked randomly. Two DataItems are considered identical
83-
if their content bytes are identical (e.g. image bytes or
84-
pdf bytes). These labels will be overridden by Annotation
85-
labels specified inside index file referenced by
86-
``import_schema_uri``,
87-
e.g. jsonl file.
102+
Optional. A dictionary of label information. Each dictionary
103+
item contains a label and a label key. Each item in the dataset
104+
includes one dictionary of label information. If a data item is
105+
added or merged into a dataset, and that data item contains an
106+
image that's identical to an image that’s already in the
107+
dataset, then the data items are merged. If two identical labels
108+
are detected during the merge, each with a different label key,
109+
then one of the label and label key dictionary items is randomly
110+
chosen to be into the merged data item. Dataset items are
111+
compared using their binary data (bytes), not on their content.
112+
If annotation labels are referenced in a schema specified by the
113+
`import_schema_url` parameter, then the labels in the
114+
`data_item_labels` dictionary are overriden by the annotations.
88115
project (str):
89-
Project to upload this dataset to. Overrides project set in
90-
aiplatform.init.
116+
The name of the Google Cloud project to which this
117+
`VideoDataset` is uploaded. This overrides the project that
118+
was set by `aiplatform.init`.
91119
location (str):
92-
Location to upload this dataset to. Overrides location set in
93-
aiplatform.init.
120+
The Google Cloud region where this dataset is uploaded. This
121+
region overrides the region that was set by `aiplatform.init`.
94122
credentials (auth_credentials.Credentials):
95-
Custom credentials to use to upload this dataset. Overrides
96-
credentials set in aiplatform.init.
123+
The credentials that are used to upload the `VideoDataset`.
124+
These credentials override the credentials set by
125+
`aiplatform.init`.
97126
request_metadata (Sequence[Tuple[str, str]]):
98-
Strings which should be sent along with the request as metadata.
127+
Strings that contain metadata that's sent with the request.
99128
labels (Dict[str, str]):
100-
Optional. Labels with user-defined metadata to organize your Tensorboards.
101-
Label keys and values can be no longer than 64 characters
102-
(Unicode codepoints), can only contain lowercase letters, numeric
103-
characters, underscores and dashes. International characters are allowed.
104-
No more than 64 user labels can be associated with one Tensorboard
105-
(System labels are excluded).
106-
See https://goo.gl/xmQnxf for more information and examples of labels.
107-
System reserved label keys are prefixed with "aiplatform.googleapis.com/"
108-
and are immutable.
129+
Optional. Labels with user-defined metadata to organize your
130+
Vertex AI Tensorboards. The maximum length of a key and of a
131+
value is 64 unicode characters. Labels and keys can contain only
132+
lowercase letters, numeric characters, underscores, and dashes.
133+
International characters are allowed. No more than 64 user
134+
labels can be associated with one Tensorboard (system labels are
135+
excluded). For more information and examples of using labels, see
136+
[Using labels to organize Google Cloud Platform resources](https://goo.gl/xmQnxf).
137+
System reserved label keys are prefixed with
138+
`aiplatform.googleapis.com/` and are immutable.
109139
encryption_spec_key_name (Optional[str]):
110140
Optional. The Cloud KMS resource identifier of the customer
111-
managed encryption key used to protect the dataset. Has the
112-
form:
113-
``projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key``.
141+
managed encryption key that's used to protect the dataset. The
142+
format of the key is
143+
`projects/my-project/locations/my-region/keyRings/my-kr/cryptoKeys/my-key`.
114144
The key needs to be in the same region as where the compute
115145
resource is created.
116146
117-
If set, this Dataset and all sub-resources of this Dataset will be secured by this key.
147+
If `encryption_spec_key_name` is set, this `VideoDataset` and
148+
all of its sub-resources are secured by this key.
118149
119-
Overrides encryption_spec_key_name set in aiplatform.init.
120-
create_request_timeout (float):
121-
Optional. The timeout for the create request in seconds.
150+
This `encryption_spec_key_name` overrides the
151+
`encryption_spec_key_name` set by `aiplatform.init`.
122152
sync (bool):
123-
Whether to execute this method synchronously. If False, this method
124-
will be executed in concurrent Future and any downstream object will
125-
be immediately returned and synced when the Future has completed.
126-
153+
If `true`, the `create` method creates a video dataset
154+
synchronously. If `false`, the `create` mdthod creates a video
155+
dataset asynchronously.
156+
create_request_timeout (float):
157+
Optional. The number of seconds for the timeout of the create
158+
request.
127159
Returns:
128160
video_dataset (VideoDataset):
129-
Instantiated representation of the managed video dataset resource.
161+
An instantiated representation of the managed
162+
`VideoDataset` resource.
130163
"""
131164
if not display_name:
132165
display_name = cls._generate_display_name()

0 commit comments

Comments
 (0)