Skip to content

Commit c829cc3

Browse files
busunkim96sirtorrymerla18emmbylwander
authored
docs: add samples from tables/automl (#54)
* Tables Notebooks [(#2090)](GoogleCloudPlatform/python-docs-samples#2090) * initial commit * update census * update notebooks * remove the reference to a bug [(#2100)](GoogleCloudPlatform/python-docs-samples#2100) as the bug has been fixed in the public client lib * delete this file. [(#2102)](GoogleCloudPlatform/python-docs-samples#2102) * rename file name [(#2103)](GoogleCloudPlatform/python-docs-samples#2103) * trying to fix images [(#2101)](GoogleCloudPlatform/python-docs-samples#2101) * remove typo in installation [(#2110)](GoogleCloudPlatform/python-docs-samples#2110) * Rename census_income_prediction.ipynb to getting_started_notebook.ipynb [(#2115)](GoogleCloudPlatform/python-docs-samples#2115) renaming the notebooks as Getting Started (will be in sync with the doc). It will be great if the folder could be renamed too * added back missing file package import [(#2150)](GoogleCloudPlatform/python-docs-samples#2150) * added back missing file import [(#2145)](GoogleCloudPlatform/python-docs-samples#2145) * remove incorrect reference to Iris dataset [(#2203)](GoogleCloudPlatform/python-docs-samples#2203) * conversion to jupyter/colab [(#2340)](GoogleCloudPlatform/python-docs-samples#2340) plus bug fixes * updated for the Jupyter support [(#2337)](GoogleCloudPlatform/python-docs-samples#2337) * updated readme for support Jupyter [(#2336)](GoogleCloudPlatform/python-docs-samples#2336) to approve with the updated notebook supporting jupyter * conversion to jupyer/colab [(#2339)](GoogleCloudPlatform/python-docs-samples#2339) plus bug fixes * conversion of notebook for jupyter/Colab [(#2338)](GoogleCloudPlatform/python-docs-samples#2338) conversion of the notebook to support both Jupyter and Colab + bug fixes * [BLOCKED] AutoML Tables: Docs samples updated to use new (pending) client [(#2276)](GoogleCloudPlatform/python-docs-samples#2276) * AutoML Tables: Docs samples updated to use new (pending) client * Linter warnings * add product recommendation for automl tables notebook [(#2257)](GoogleCloudPlatform/python-docs-samples#2257) * added colab filtering notebook * update to tables client * update readme * tell user to restart kernel for automl * AutoML Tables: Notebook samples updated to use new tables client [(#2424)](GoogleCloudPlatform/python-docs-samples#2424) * fix users bug and emphasize kernal restart [(#2407)](GoogleCloudPlatform/python-docs-samples#2407) * fix problems with automl docs [(#2501)](GoogleCloudPlatform/python-docs-samples#2501) Today when we try to use the function `batch_predict` follow the docs we receive and error saying: `the paramaters should be a pandas.Dataframe` it’s happens because the first parameter of the function `batch_predict` is a pandas.Dataframe. To solve this problem we need to use de named parameters of python. * Fix typo in GCS URI parameter [(#2459)](GoogleCloudPlatform/python-docs-samples#2459) * fix: fix tables notebook links and bugs [(#2601)](GoogleCloudPlatform/python-docs-samples#2601) * feat(tables): update samples to show explainability [(#2523)](GoogleCloudPlatform/python-docs-samples#2523) * show xai * local feature importance * use updated client * use fixed library * use new model * Auto-update dependencies. [(#2005)](GoogleCloudPlatform/python-docs-samples#2005) * Auto-update dependencies. * Revert update of appengine/flexible/datastore. * revert update of appengine/flexible/scipy * revert update of bigquery/bqml * revert update of bigquery/cloud-client * revert update of bigquery/datalab-migration * revert update of bigtable/quickstart * revert update of compute/api * revert update of container_registry/container_analysis * revert update of dataflow/run_template * revert update of datastore/cloud-ndb * revert update of dialogflow/cloud-client * revert update of dlp * revert update of functions/imagemagick * revert update of functions/ocr/app * revert update of healthcare/api-client/fhir * revert update of iam/api-client * revert update of iot/api-client/gcs_file_to_device * revert update of iot/api-client/mqtt_example * revert update of language/automl * revert update of run/image-processing * revert update of vision/automl * revert update testing/requirements.txt * revert update of vision/cloud-client/detect * revert update of vision/cloud-client/product_search * revert update of jobs/v2/api_client * revert update of jobs/v3/api_client * revert update of opencensus * revert update of translate/cloud-client * revert update to speech/cloud-client Co-authored-by: Kurtis Van Gent <[email protected]> Co-authored-by: Doug Mahugh <[email protected]> * Update dependency google-cloud-automl to v0.10.0 [(#3033)](GoogleCloudPlatform/python-docs-samples#3033) Co-authored-by: Bu Sun Kim <[email protected]> Co-authored-by: Leah E. Cole <[email protected]> * Simplify noxfile setup. [(#2806)](GoogleCloudPlatform/python-docs-samples#2806) * chore(deps): update dependency requests to v2.23.0 * Simplify noxfile and add version control. * Configure appengine/standard to only test Python 2.7. * Update Kokokro configs to match noxfile. * Add requirements-test to each folder. * Remove Py2 versions from everything execept appengine/standard. * Remove conftest.py. * Remove appengine/standard/conftest.py * Remove 'no-sucess-flaky-report' from pytest.ini. * Add GAE SDK back to appengine/standard tests. * Fix typo. * Roll pytest to python 2 version. * Add a bunch of testing requirements. * Remove typo. * Add appengine lib directory back in. * Add some additional requirements. * Fix issue with flake8 args. * Even more requirements. * Readd appengine conftest.py. * Add a few more requirements. * Even more Appengine requirements. * Add webtest for appengine/standard/mailgun. * Add some additional requirements. * Add workaround for issue with mailjet-rest. * Add responses for appengine/standard/mailjet. Co-authored-by: Renovate Bot <[email protected]> * chore: some lint fixes [(#3750)](GoogleCloudPlatform/python-docs-samples#3750) * automl: tables code sample clean-up [(#3571)](GoogleCloudPlatform/python-docs-samples#3571) * delete unused tables_dataset samples * delete args code associated with unused automl_tables samples * delete tests associated with unused automl_tables samples * restore get_dataset method/yargs without region tagging * Restore update_dataset methodsa without region tagging Co-authored-by: Takashi Matsuo <[email protected]> Co-authored-by: Leah E. Cole <[email protected]> * add example of creating AutoML Tables client with non-default endpoint ('new' sdk) [(#3929)](GoogleCloudPlatform/python-docs-samples#3929) * add example of creating client with non-default endpoint * more test file cleanup * move connectivity print stmt out of test fn Co-authored-by: Leah E. Cole <[email protected]> Co-authored-by: Torry Yang <[email protected]> * Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](GoogleCloudPlatform/python-docs-samples#4022) * chore(deps): update dependency google-cloud-automl to v1 [(#4127)](GoogleCloudPlatform/python-docs-samples#4127) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-cloud-automl](https://togithub.com/googleapis/python-automl) | major | `==0.10.0` -> `==1.0.1` | --- ### Release Notes <details> <summary>googleapis/python-automl</summary> ### [`v1.0.1`](https://togithub.com/googleapis/python-automl/blob/master/CHANGELOG.md#&#8203;101-httpswwwgithubcomgoogleapispython-automlcomparev100v101-2020-06-18) [Compare Source](https://togithub.com/googleapis/python-automl/compare/v0.10.0...v1.0.1) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * [tables/automl] fix: update the csv file and the dataset name [(#4188)](GoogleCloudPlatform/python-docs-samples#4188) fixes #4177 fixes #4178 * samples: Automl table batch test [(#4267)](GoogleCloudPlatform/python-docs-samples#4267) * added rtest req.txt * samples: added automl batch predict test * added missing package * Update tables/automl/batch_predict_test.py Co-authored-by: Bu Sun Kim <[email protected]> Co-authored-by: Bu Sun Kim <[email protected]> * samples: fixed wrong format on GCS input Uri [(#4270)](GoogleCloudPlatform/python-docs-samples#4270) ## Description Current predict sample indicates that it can multiples GCS URI inputs but it should be singular. ## Checklist - [X] Please **merge** this PR for me once it is approved. * chore(deps): update dependency pytest to v5.4.3 [(#4279)](GoogleCloudPlatform/python-docs-samples#4279) * chore(deps): update dependency pytest to v5.4.3 * specify pytest for python 2 in appengine Co-authored-by: Leah Cole <[email protected]> * Update automl_tables_predict.py with batch_predict_bq sample [(#4142)](GoogleCloudPlatform/python-docs-samples#4142) Added a new method `batch_predict_bq` demonstrating running batch_prediction using BigQuery. Added notes in comments about asynchronicity for `batch_predict` method. The region `automl_tables_batch_predict_bq` will be used on cloud.google.com (currently both sections for GCS and BigQuery use the same sample code which is incorrect). Fixes #4141 Note: It's a good idea to open an issue first for discussion. - [x] Please **merge** this PR for me once it is approved. * Update dependency pytest to v6 [(#4390)](GoogleCloudPlatform/python-docs-samples#4390) * chore: exclude notebooks * chore: update templates * chore: add codeowners and fix tests * chore: ignore warnings from sphinx * chore: fix tables client * test: fix unit tests Co-authored-by: Torry Yang <[email protected]> Co-authored-by: florencep <[email protected]> Co-authored-by: Mike Burton <[email protected]> Co-authored-by: Lars Wander <[email protected]> Co-authored-by: Michael Hu <[email protected]> Co-authored-by: Michael Hu <[email protected]> Co-authored-by: Alefh Sousa <[email protected]> Co-authored-by: DPEBot <[email protected]> Co-authored-by: Kurtis Van Gent <[email protected]> Co-authored-by: Doug Mahugh <[email protected]> Co-authored-by: WhiteSource Renovate <[email protected]> Co-authored-by: Leah E. Cole <[email protected]> Co-authored-by: Takashi Matsuo <[email protected]> Co-authored-by: Anthony <[email protected]> Co-authored-by: Amy <[email protected]> Co-authored-by: Mike <[email protected]> Co-authored-by: Leah Cole <[email protected]> Co-authored-by: Sergei Dorogin <[email protected]>
1 parent 0516d4c commit c829cc3

15 files changed

+1666
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Code owners file.
2+
# This file controls who is tagged for review for any given pull request.
3+
#
4+
# For syntax help see:
5+
# https://help.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-syntax
6+
7+
8+
/samples/**/*.py @telpirion @sirtorry @googleapis/python-samples-owners

packages/google-cloud-automl/google/cloud/automl_v1beta1/tables/tables_client.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -2762,6 +2762,7 @@ def batch_predict(
27622762
region=None,
27632763
credentials=None,
27642764
inputs=None,
2765+
params={},
27652766
**kwargs
27662767
):
27672768
"""Makes a batch prediction on a model. This does _not_ require the
@@ -2828,6 +2829,9 @@ def batch_predict(
28282829
The `model` instance you want to predict with . This must be
28292830
supplied if `model_display_name` or `model_name` are not
28302831
supplied.
2832+
params (Optional[dict]):
2833+
Additional domain-specific parameters for the predictions,
2834+
any string must be up to 25000 characters long.
28312835
28322836
Returns:
28332837
google.api_core.operation.Operation:
@@ -2886,7 +2890,7 @@ def batch_predict(
28862890
)
28872891

28882892
op = self.prediction_client.batch_predict(
2889-
model_name, input_request, output_request, **kwargs
2893+
model_name, input_request, output_request, params, **kwargs
28902894
)
28912895
self.__log_operation_info("Batch predict", op)
28922896
return op
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
#!/usr/bin/env python
2+
3+
# Copyright 2019 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
"""This application demonstrates how to perform basic operations on dataset
18+
with the Google AutoML Tables API.
19+
20+
For more information, the documentation at
21+
https://cloud.google.com/automl-tables/docs.
22+
"""
23+
24+
import argparse
25+
import os
26+
27+
28+
def create_dataset(project_id, compute_region, dataset_display_name):
29+
"""Create a dataset."""
30+
# [START automl_tables_create_dataset]
31+
# TODO(developer): Uncomment and set the following variables
32+
# project_id = 'PROJECT_ID_HERE'
33+
# compute_region = 'COMPUTE_REGION_HERE'
34+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
35+
36+
from google.cloud import automl_v1beta1 as automl
37+
38+
client = automl.TablesClient(project=project_id, region=compute_region)
39+
40+
# Create a dataset with the given display name
41+
dataset = client.create_dataset(dataset_display_name)
42+
43+
# Display the dataset information.
44+
print("Dataset name: {}".format(dataset.name))
45+
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
46+
print("Dataset display name: {}".format(dataset.display_name))
47+
print("Dataset metadata:")
48+
print("\t{}".format(dataset.tables_dataset_metadata))
49+
print("Dataset example count: {}".format(dataset.example_count))
50+
print("Dataset create time:")
51+
print("\tseconds: {}".format(dataset.create_time.seconds))
52+
print("\tnanos: {}".format(dataset.create_time.nanos))
53+
54+
# [END automl_tables_create_dataset]
55+
56+
return dataset
57+
58+
59+
def list_datasets(project_id, compute_region, filter_=None):
60+
"""List all datasets."""
61+
result = []
62+
# [START automl_tables_list_datasets]
63+
# TODO(developer): Uncomment and set the following variables
64+
# project_id = 'PROJECT_ID_HERE'
65+
# compute_region = 'COMPUTE_REGION_HERE'
66+
# filter_ = 'filter expression here'
67+
68+
from google.cloud import automl_v1beta1 as automl
69+
70+
client = automl.TablesClient(project=project_id, region=compute_region)
71+
72+
# List all the datasets available in the region by applying filter.
73+
response = client.list_datasets(filter_=filter_)
74+
75+
print("List of datasets:")
76+
for dataset in response:
77+
# Display the dataset information.
78+
print("Dataset name: {}".format(dataset.name))
79+
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
80+
print("Dataset display name: {}".format(dataset.display_name))
81+
metadata = dataset.tables_dataset_metadata
82+
print(
83+
"Dataset primary table spec id: {}".format(
84+
metadata.primary_table_spec_id
85+
)
86+
)
87+
print(
88+
"Dataset target column spec id: {}".format(
89+
metadata.target_column_spec_id
90+
)
91+
)
92+
print(
93+
"Dataset target column spec id: {}".format(
94+
metadata.target_column_spec_id
95+
)
96+
)
97+
print(
98+
"Dataset weight column spec id: {}".format(
99+
metadata.weight_column_spec_id
100+
)
101+
)
102+
print(
103+
"Dataset ml use column spec id: {}".format(
104+
metadata.ml_use_column_spec_id
105+
)
106+
)
107+
print("Dataset example count: {}".format(dataset.example_count))
108+
print("Dataset create time:")
109+
print("\tseconds: {}".format(dataset.create_time.seconds))
110+
print("\tnanos: {}".format(dataset.create_time.nanos))
111+
print("\n")
112+
113+
# [END automl_tables_list_datasets]
114+
result.append(dataset)
115+
116+
return result
117+
118+
119+
def get_dataset(project_id, compute_region, dataset_display_name):
120+
"""Get the dataset."""
121+
# TODO(developer): Uncomment and set the following variables
122+
# project_id = 'PROJECT_ID_HERE'
123+
# compute_region = 'COMPUTE_REGION_HERE'
124+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
125+
126+
from google.cloud import automl_v1beta1 as automl
127+
128+
client = automl.TablesClient(project=project_id, region=compute_region)
129+
130+
# Get complete detail of the dataset.
131+
dataset = client.get_dataset(dataset_display_name=dataset_display_name)
132+
133+
# Display the dataset information.
134+
print("Dataset name: {}".format(dataset.name))
135+
print("Dataset id: {}".format(dataset.name.split("/")[-1]))
136+
print("Dataset display name: {}".format(dataset.display_name))
137+
print("Dataset metadata:")
138+
print("\t{}".format(dataset.tables_dataset_metadata))
139+
print("Dataset example count: {}".format(dataset.example_count))
140+
print("Dataset create time:")
141+
print("\tseconds: {}".format(dataset.create_time.seconds))
142+
print("\tnanos: {}".format(dataset.create_time.nanos))
143+
144+
return dataset
145+
146+
147+
def import_data(project_id, compute_region, dataset_display_name, path):
148+
"""Import structured data."""
149+
# [START automl_tables_import_data]
150+
# TODO(developer): Uncomment and set the following variables
151+
# project_id = 'PROJECT_ID_HERE'
152+
# compute_region = 'COMPUTE_REGION_HERE'
153+
# dataset_display_name = 'DATASET_DISPLAY_NAME'
154+
# path = 'gs://path/to/file.csv' or 'bq://project_id.dataset.table_id'
155+
156+
from google.cloud import automl_v1beta1 as automl
157+
158+
client = automl.TablesClient(project=project_id, region=compute_region)
159+
160+
response = None
161+
if path.startswith("bq"):
162+
response = client.import_data(
163+
dataset_display_name=dataset_display_name, bigquery_input_uri=path
164+
)
165+
else:
166+
# Get the multiple Google Cloud Storage URIs.
167+
input_uris = path.split(",")
168+
response = client.import_data(
169+
dataset_display_name=dataset_display_name,
170+
gcs_input_uris=input_uris,
171+
)
172+
173+
print("Processing import...")
174+
# synchronous check of operation status.
175+
print("Data imported. {}".format(response.result()))
176+
177+
# [END automl_tables_import_data]
178+
179+
180+
def update_dataset(
181+
project_id,
182+
compute_region,
183+
dataset_display_name,
184+
target_column_spec_name=None,
185+
weight_column_spec_name=None,
186+
test_train_column_spec_name=None,
187+
):
188+
"""Update dataset."""
189+
# TODO(developer): Uncomment and set the following variables
190+
# project_id = 'PROJECT_ID_HERE'
191+
# compute_region = 'COMPUTE_REGION_HERE'
192+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE'
193+
# target_column_spec_name = 'TARGET_COLUMN_SPEC_NAME_HERE' or None
194+
# weight_column_spec_name = 'WEIGHT_COLUMN_SPEC_NAME_HERE' or None
195+
# test_train_column_spec_name = 'TEST_TRAIN_COLUMN_SPEC_NAME_HERE' or None
196+
197+
from google.cloud import automl_v1beta1 as automl
198+
199+
client = automl.TablesClient(project=project_id, region=compute_region)
200+
201+
if target_column_spec_name is not None:
202+
response = client.set_target_column(
203+
dataset_display_name=dataset_display_name,
204+
column_spec_display_name=target_column_spec_name,
205+
)
206+
print("Target column updated. {}".format(response))
207+
if weight_column_spec_name is not None:
208+
response = client.set_weight_column(
209+
dataset_display_name=dataset_display_name,
210+
column_spec_display_name=weight_column_spec_name,
211+
)
212+
print("Weight column updated. {}".format(response))
213+
if test_train_column_spec_name is not None:
214+
response = client.set_test_train_column(
215+
dataset_display_name=dataset_display_name,
216+
column_spec_display_name=test_train_column_spec_name,
217+
)
218+
print("Test/train column updated. {}".format(response))
219+
220+
221+
def delete_dataset(project_id, compute_region, dataset_display_name):
222+
"""Delete a dataset"""
223+
# [START automl_tables_delete_dataset]
224+
# TODO(developer): Uncomment and set the following variables
225+
# project_id = 'PROJECT_ID_HERE'
226+
# compute_region = 'COMPUTE_REGION_HERE'
227+
# dataset_display_name = 'DATASET_DISPLAY_NAME_HERE
228+
229+
from google.cloud import automl_v1beta1 as automl
230+
231+
client = automl.TablesClient(project=project_id, region=compute_region)
232+
233+
# Delete a dataset.
234+
response = client.delete_dataset(dataset_display_name=dataset_display_name)
235+
236+
# synchronous check of operation status.
237+
print("Dataset deleted. {}".format(response.result()))
238+
# [END automl_tables_delete_dataset]
239+
240+
241+
if __name__ == "__main__":
242+
parser = argparse.ArgumentParser(
243+
description=__doc__,
244+
formatter_class=argparse.RawDescriptionHelpFormatter,
245+
)
246+
subparsers = parser.add_subparsers(dest="command")
247+
248+
create_dataset_parser = subparsers.add_parser(
249+
"create_dataset", help=create_dataset.__doc__
250+
)
251+
create_dataset_parser.add_argument("--dataset_name")
252+
253+
list_datasets_parser = subparsers.add_parser(
254+
"list_datasets", help=list_datasets.__doc__
255+
)
256+
list_datasets_parser.add_argument("--filter_")
257+
258+
get_dataset_parser = subparsers.add_parser(
259+
"get_dataset", help=get_dataset.__doc__
260+
)
261+
get_dataset_parser.add_argument("--dataset_display_name")
262+
263+
import_data_parser = subparsers.add_parser(
264+
"import_data", help=import_data.__doc__
265+
)
266+
import_data_parser.add_argument("--dataset_display_name")
267+
import_data_parser.add_argument("--path")
268+
269+
update_dataset_parser = subparsers.add_parser(
270+
"update_dataset", help=update_dataset.__doc__
271+
)
272+
update_dataset_parser.add_argument("--dataset_display_name")
273+
update_dataset_parser.add_argument("--target_column_spec_name")
274+
update_dataset_parser.add_argument("--weight_column_spec_name")
275+
update_dataset_parser.add_argument("--ml_use_column_spec_name")
276+
277+
delete_dataset_parser = subparsers.add_parser(
278+
"delete_dataset", help=delete_dataset.__doc__
279+
)
280+
delete_dataset_parser.add_argument("--dataset_display_name")
281+
282+
project_id = os.environ["PROJECT_ID"]
283+
compute_region = os.environ["REGION_NAME"]
284+
285+
args = parser.parse_args()
286+
if args.command == "create_dataset":
287+
create_dataset(project_id, compute_region, args.dataset_name)
288+
if args.command == "list_datasets":
289+
list_datasets(project_id, compute_region, args.filter_)
290+
if args.command == "get_dataset":
291+
get_dataset(project_id, compute_region, args.dataset_display_name)
292+
if args.command == "import_data":
293+
import_data(
294+
project_id, compute_region, args.dataset_display_name, args.path
295+
)
296+
if args.command == "update_dataset":
297+
update_dataset(
298+
project_id,
299+
compute_region,
300+
args.dataset_display_name,
301+
args.target_column_spec_name,
302+
args.weight_column_spec_name,
303+
args.ml_use_column_spec_name,
304+
)
305+
if args.command == "delete_dataset":
306+
delete_dataset(project_id, compute_region, args.dataset_display_name)

0 commit comments

Comments
 (0)