Skip to content

Commit 5ab0fe2

Browse files
arithmetic1728Bill PrinwaprinJon Wayne Parrottekampf
authored
chore: move samples from python-docs-sample (#66)
* Add XMPP Sample * Add Dataproc Sample * Add more region tags * Minor dataproc fixes * Fix Dataproc e2e for Python 3 * Update reqs * updating requirements [(#358)](#358) Change-Id: I6177a17fad021e26ed76679d9db34848c17b62a8 * Update Reqs * Wrong arg description * Auto-update dependencies. [(#456)](#456) * Auto-update dependencies. [(#459)](#459) * Fix import order lint errors Change-Id: Ieaf7237fc6f925daec46a07d2e81a452b841198a * bump Change-Id: I02e7767d13ba267ee9fc72c5b68a57013bb8b8d3 * Auto-update dependencies. [(#486)](#486) * Auto-update dependencies. [(#540)](#540) * Auto-update dependencies. [(#542)](#542) * Move to google-cloud [(#544)](#544) * Auto-update dependencies. [(#584)](#584) * Auto-update dependencies. [(#629)](#629) * Update samples to support latest Google Cloud Python [(#656)](#656) * Update README.md [(#691)](#691) * Auto-update dependencies. [(#715)](#715) * Auto-update dependencies. [(#735)](#735) * Auto-update dependencies. * Fix language OCR sample * Remove unused import * Auto-update dependencies. [(#790)](#790) * Remove usage of GoogleCredentials [(#810)](#810) * Fix a typo [(#813)](#813) * Remove cloud config fixture [(#887)](#887) * Remove cloud config fixture * Fix client secrets * Fix bigtable instance * Fix reference to our testing tools * Auto-update dependencies. [(#914)](#914) * Auto-update dependencies. * xfail the error reporting test * Fix lint * Auto-update dependencies. [(#922)](#922) * Auto-update dependencies. * Fix pubsub iam samples * Auto-update dependencies. [(#1005)](#1005) * Auto-update dependencies. * Fix bigtable lint * Fix IOT iam interaction * Auto-update dependencies. [(#1011)](#1011) * Properly forwarding the "region" parameter provided as an input argument. [(#1029)](#1029) * Auto-update dependencies. [(#1055)](#1055) * Auto-update dependencies. * Explicitly use latest bigtable client Change-Id: Id71e9e768f020730e4ca9514a0d7ebaa794e7d9e * Revert language update for now Change-Id: I8867f154e9a5aae00d0047c9caf880e5e8f50c53 * Remove pdb. smh Change-Id: I5ff905fadc026eebbcd45512d4e76e003e3b2b43 * Fix region handling and allow to use an existing cluster. [(#1053)](#1053) * Auto-update dependencies. [(#1094)](#1094) * Auto-update dependencies. * Relax assertions in the ocr_nl sample Change-Id: I6d37e5846a8d6dd52429cb30d501f448c52cbba1 * Drop unused logging apiary samples Change-Id: I545718283773cb729a5e0def8a76ebfa40829d51 * Auto-update dependencies. [(#1133)](#1133) * Auto-update dependencies. * Fix missing http library Change-Id: I99faa600f2f3f1f50f57694fc9835d7f35bda250 * Auto-update dependencies. [(#1186)](#1186) * Auto-update dependencies. [(#1199)](#1199) * Auto-update dependencies. * Fix iot lint Change-Id: I6289e093bdb35e38f9e9bfc3fbc3df3660f9a67e * Fixed Failed Kokoro Test (Dataproc) [(#1203)](#1203) * Fixed Failed Kokoro Test (Dataproc) * Fixed Lint Error * Update dataproc_e2e_test.py * Update dataproc_e2e_test.py * Fixing More Lint Errors * Fixed b/65407087 * Revert "Merge branch 'master' of https://github.com/michaelawyu/python-docs-samples" This reverts commit 1614c7d, reversing changes made to cd1dbfd. * Revert "Fixed b/65407087" This reverts commit cd1dbfd. * Fixed Lint Error * Fixed Lint Error * Auto-update dependencies. [(#1208)](#1208) * Dataproc GCS sample plus doc touchups [(#1151)](#1151) * Auto-update dependencies. [(#1217)](#1217) * Auto-update dependencies. [(#1239)](#1239) * Added "Open in Cloud Shell" buttons to README files [(#1254)](#1254) * Auto-update dependencies. [(#1282)](#1282) * Auto-update dependencies. * Fix storage acl sample Change-Id: I413bea899fdde4c4859e4070a9da25845b81f7cf * Auto-update dependencies. [(#1309)](#1309) * Auto-update dependencies. [(#1320)](#1320) * Auto-update dependencies. [(#1355)](#1355) * Auto-update dependencies. [(#1359)](#1359) * Auto-update dependencies. * update Dataproc region tags to standard format [(#1826)](#1826) * Update submit_job_to_cluster.py [(#1708)](#1708) switch region to new 'global' region and remove unnecessary function. * Auto-update dependencies. [(#1846)](#1846) ACK, merging. * Need separate install for google-cloud-storage [(#1863)](#1863) * Revert "Update dataproc/submit_job_to_cluster.py" [(#1864)](#1864) * Revert "Remove test configs for non-testing directories [(#1855)](#1855)" This reverts commit 73a7332. * Revert "Auto-update dependencies. [(#1846)](#1846)" This reverts commit 3adc94f4d0c14453153968c3851fae100e2c5e44. * Revert "Tweak slack sample [(#1847)](#1847)" This reverts commit a48c010. * Revert "Non-client library example of constructing a Signed URL [(#1837)](#1837)" This reverts commit fc3284d. * Revert "GCF samples: handle {empty JSON, GET} requests + remove commas [(#1832)](#1832)" This reverts commit 6928491. * Revert "Correct the maintenance event types [(#1830)](#1830)" This reverts commit c22840f. * Revert "Fix GCF region tags [(#1827)](#1827)" This reverts commit 0fbfef2. * Revert "Updated to Flask 1.0 [(#1819)](#1819)" This reverts commit d52ccf9. * Revert "Fix deprecation warning [(#1801)](#1801)" This reverts commit 981737e. * Revert "Update submit_job_to_cluster.py [(#1708)](#1708)" This reverts commit df1f2b22547b7ca86bbdb791ad930003a815a677. * Create python-api-walkthrough.md [(#1966)](#1966) * Create python-api-walkthrough.md This Google Cloud Shell walkthrough is linked to Cloud Dataproc documentation to be published at: https://cloud.google.com/dataproc/docs/tutorials/python-library-example * Update python-api-walkthrough.md * Update list_clusters.py [(#1887)](#1887) * Auto-update dependencies. [(#1980)](#1980) * Auto-update dependencies. * Update requirements.txt * Update requirements.txt * Update Dataproc samples. [(#2158)](#2158) * Update requirements.txt * Update python-api-walkthrough.md * Update submit_job_to_cluster.py * Update list_clusters.py * Update python-api-walkthrough.md [(#2172)](#2172) * Adds updates including compute [(#2436)](#2436) * Adds updates including compute * Python 2 compat pytest * Fixing weird \r\n issue from GH merge * Put asset tests back in * Re-add pod operator test * Hack parameter for k8s pod operator * feat: adding samples for dataproc - create cluster [(#2536)](#2536) * adding sample for cluster create * small fix * Add create cluster samples * Fixed copyright, added 'dataproc' to region tag and changed imports from 'dataproc' to 'dataproc_v1' * Fix copyright in create_cluster.py * Auto-update dependencies. [(#2005)](#2005) * Auto-update dependencies. * Revert update of appengine/flexible/datastore. * revert update of appengine/flexible/scipy * revert update of bigquery/bqml * revert update of bigquery/cloud-client * revert update of bigquery/datalab-migration * revert update of bigtable/quickstart * revert update of compute/api * revert update of container_registry/container_analysis * revert update of dataflow/run_template * revert update of datastore/cloud-ndb * revert update of dialogflow/cloud-client * revert update of dlp * revert update of functions/imagemagick * revert update of functions/ocr/app * revert update of healthcare/api-client/fhir * revert update of iam/api-client * revert update of iot/api-client/gcs_file_to_device * revert update of iot/api-client/mqtt_example * revert update of language/automl * revert update of run/image-processing * revert update of vision/automl * revert update testing/requirements.txt * revert update of vision/cloud-client/detect * revert update of vision/cloud-client/product_search * revert update of jobs/v2/api_client * revert update of jobs/v3/api_client * revert update of opencensus * revert update of translate/cloud-client * revert update to speech/cloud-client Co-authored-by: Kurtis Van Gent <[email protected]> Co-authored-by: Doug Mahugh <[email protected]> * feat: dataproc quickstart sample added and create_cluster updated [(#2629)](#2629) * Adding quickstart sample * Added new quickstart sample and updated create_cluster sample * Fix to create_cluster.py * deleted dataproc quickstart files not under dataproc/quickstart/ * Added quickstart test * Linting and formatting fixes * Revert "Linting and formatting fixes" This reverts commit c5afcbc. * Added bucket cleanup to quickstart test * Changes to samples and tests * Linting fixes * Removed todos in favor of clearer docstring * Fixed lint error Co-authored-by: Leah E. Cole <[email protected]> * Update Python Cloud Shell walkthrough script [(#2733)](#2733) Cloud Shell walkthrough scripts no longer support enabling APIs. APIs must be enabled by linking to the console. Updated product name: "Cloud Dataproc" -> "Dataproc". * fix: added cli functionality to dataproc quickstart example [(#2734)](#2734) * Added CLI functionality to quickstart * Fixed Dataproc quickstart test to properly clean up GCS bucket [(#3001)](#3001) * splitting up #2651 part 1/3 - dataproc + endpoints [(#3025)](#3025) * splitting up #2651 * fix typos * chore(deps): update dependency google-auth to v1.11.2 [(#2724)](#2724) Co-authored-by: Leah E. Cole <[email protected]> * chore(deps): update dependency google-cloud-storage to v1.26.0 [(#3046)](#3046) * chore(deps): update dependency google-cloud-storage to v1.26.0 * chore(deps): specify dependencies by python version * chore: up other deps to try to remove errors Co-authored-by: Leah E. Cole <[email protected]> Co-authored-by: Leah Cole <[email protected]> * chore(deps): update dependency google-cloud-dataproc to v0.7.0 [(#3083)](#3083) * feat: added dataproc workflows samples [(#3056)](#3056) * Added workflows sample * chore(deps): update dependency grpcio to v1.27.2 [(#3173)](#3173) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [grpcio](https://grpc.io) | minor | `==1.25.0` -> `==1.27.2` | | [grpcio](https://grpc.io) | minor | `==1.23.0` -> `==1.27.2` | | [grpcio](https://grpc.io) | minor | `==1.26.0` -> `==1.27.2` | | [grpcio](https://grpc.io) | patch | `==1.27.1` -> `==1.27.2` | --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * Simplify noxfile setup. [(#2806)](#2806) * chore(deps): update dependency requests to v2.23.0 * Simplify noxfile and add version control. * Configure appengine/standard to only test Python 2.7. * Update Kokokro configs to match noxfile. * Add requirements-test to each folder. * Remove Py2 versions from everything execept appengine/standard. * Remove conftest.py. * Remove appengine/standard/conftest.py * Remove 'no-sucess-flaky-report' from pytest.ini. * Add GAE SDK back to appengine/standard tests. * Fix typo. * Roll pytest to python 2 version. * Add a bunch of testing requirements. * Remove typo. * Add appengine lib directory back in. * Add some additional requirements. * Fix issue with flake8 args. * Even more requirements. * Readd appengine conftest.py. * Add a few more requirements. * Even more Appengine requirements. * Add webtest for appengine/standard/mailgun. * Add some additional requirements. * Add workaround for issue with mailjet-rest. * Add responses for appengine/standard/mailjet. Co-authored-by: Renovate Bot <[email protected]> * fix: add mains to samples [(#3284)](#3284) Added mains to two samples: create_cluster and instantiate_inline_workflow_templates. Fixed their associated tests to accommodate this. Removed subprocess from quickstart/quickstart_test.py to fix [2873](#2873) fixes #2873 * Update dependency grpcio to v1.28.1 [(#3276)](#3276) Co-authored-by: Leah E. Cole <[email protected]> * Update dependency google-auth to v1.14.0 [(#3148)](#3148) Co-authored-by: Leah E. Cole <[email protected]> * chore(deps): update dependency google-auth to v1.14.1 [(#3464)](#3464) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.14.0` -> `==1.14.1` | | [google-auth](https://togithub.com/googleapis/google-auth-library-python) | minor | `==1.11.2` -> `==1.14.1` | --- ### Release Notes <details> <summary>googleapis/google-auth-library-python</summary> ### [`v1.14.1`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1141-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1140v1141-2020-04-21) [Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.14.0...v1.14.1) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * chore(deps): update dependency google-cloud-storage to v1.28.0 [(#3260)](#3260) Co-authored-by: Takashi Matsuo <[email protected]> * chore(deps): update dependency google-auth to v1.14.2 [(#3724)](#3724) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.14.1` -> `==1.14.2` | --- ### Release Notes <details> <summary>googleapis/google-auth-library-python</summary> ### [`v1.14.2`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1142-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1141v1142-2020-05-07) [Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.14.1...v1.14.2) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * chore: some lint fixes [(#3743)](#3743) * chore(deps): update dependency google-auth to v1.14.3 [(#3728)](#3728) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.14.2` -> `==1.14.3` | --- ### Release Notes <details> <summary>googleapis/google-auth-library-python</summary> ### [`v1.14.3`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1143-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1142v1143-2020-05-11) [Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.14.2...v1.14.3) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [x] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * chore(deps): update dependency grpcio to v1.29.0 [(#3786)](#3786) * chore(deps): update dependency google-cloud-storage to v1.28.1 [(#3785)](#3785) * chore(deps): update dependency google-cloud-storage to v1.28.1 * [asset] testing: use uuid instead of time Co-authored-by: Takashi Matsuo <[email protected]> * update google-auth to 1.15.0 part 3 [(#3816)](#3816) * Update dependency google-cloud-dataproc to v0.8.0 [(#3837)](#3837) * chore(deps): update dependency google-auth to v1.16.0 [(#3903)](#3903) * update google-auth part 3 [(#3963)](#3963) * chore(deps): update dependency google-cloud-dataproc to v0.8.1 [(#4015)](#4015) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-cloud-dataproc](https://togithub.com/googleapis/python-dataproc) | patch | `==0.8.0` -> `==0.8.1` | --- ### Release Notes <details> <summary>googleapis/python-dataproc</summary> ### [`v0.8.1`](https://togithub.com/googleapis/python-dataproc/blob/master/CHANGELOG.md#&#8203;081-httpswwwgithubcomgoogleapispython-dataproccomparev080v081-2020-06-05) [Compare Source](https://togithub.com/googleapis/python-dataproc/compare/v0.8.0...v0.8.1) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * Replace GCLOUD_PROJECT with GOOGLE_CLOUD_PROJECT. [(#4022)](#4022) * Update dependency google-auth to v1.17.0 [(#4058)](#4058) * chore(deps): update dependency google-auth to v1.17.1 [(#4073)](#4073) * Update dependency google-auth to v1.17.2 [(#4083)](#4083) * Update dependency google-auth to v1.18.0 [(#4125)](#4125) * Update dependency google-cloud-dataproc to v1 [(#4109)](#4109) Co-authored-by: Takashi Matsuo <[email protected]> * chore(deps): update dependency google-cloud-storage to v1.29.0 [(#4040)](#4040) * chore(deps): update dependency grpcio to v1.30.0 [(#4143)](#4143) Co-authored-by: Takashi Matsuo <[email protected]> * Update dependency google-auth-httplib2 to v0.0.4 [(#4255)](#4255) Co-authored-by: Takashi Matsuo <[email protected]> * chore(deps): update dependency pytest to v5.4.3 [(#4279)](#4279) * chore(deps): update dependency pytest to v5.4.3 * specify pytest for python 2 in appengine Co-authored-by: Leah Cole <[email protected]> * chore(deps): update dependency google-auth to v1.19.0 [(#4293)](#4293) * chore(deps): update dependency google-cloud-dataproc to v1.0.1 [(#4309)](#4309) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-cloud-dataproc](https://togithub.com/googleapis/python-dataproc) | patch | `==1.0.0` -> `==1.0.1` | --- ### Release Notes <details> <summary>googleapis/python-dataproc</summary> ### [`v1.0.1`](https://togithub.com/googleapis/python-dataproc/blob/master/CHANGELOG.md#&#8203;101-httpswwwgithubcomgoogleapispython-dataproccomparev100v101-2020-07-16) [Compare Source](https://togithub.com/googleapis/python-dataproc/compare/v1.0.0...v1.0.1) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * chore(deps): update dependency google-auth to v1.19.1 [(#4304)](#4304) * chore(deps): update dependency google-auth to v1.19.2 [(#4321)](#4321) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.19.1` -> `==1.19.2` | --- ### Release Notes <details> <summary>googleapis/google-auth-library-python</summary> ### [`v1.19.2`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1192-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1191v1192-2020-07-17) [Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.19.1...v1.19.2) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples). * Update dependency google-auth to v1.20.0 [(#4387)](#4387) * Update dependency pytest to v6 [(#4390)](#4390) * Update dependency grpcio to v1.31.0 [(#4438)](#4438) * chore(deps): update dependency google-auth to v1.20.1 [(#4452)](#4452) * chore: update templates Co-authored-by: Bill Prin <[email protected]> Co-authored-by: Bill Prin <[email protected]> Co-authored-by: Jon Wayne Parrott <[email protected]> Co-authored-by: Eran Kampf <[email protected]> Co-authored-by: DPE bot <[email protected]> Co-authored-by: aman-ebay <[email protected]> Co-authored-by: Martial Hue <[email protected]> Co-authored-by: Gioia Ballin <[email protected]> Co-authored-by: michaelawyu <[email protected]> Co-authored-by: michaelawyu <[email protected]> Co-authored-by: Alix Hamilton <[email protected]> Co-authored-by: James Winegar <[email protected]> Co-authored-by: Charles Engelke <[email protected]> Co-authored-by: Gus Class <[email protected]> Co-authored-by: Brad Miro <[email protected]> Co-authored-by: Kurtis Van Gent <[email protected]> Co-authored-by: Doug Mahugh <[email protected]> Co-authored-by: Leah E. Cole <[email protected]> Co-authored-by: WhiteSource Renovate <[email protected]> Co-authored-by: Leah Cole <[email protected]> Co-authored-by: Takashi Matsuo <[email protected]>
0 parents  commit 5ab0fe2

17 files changed

+1627
-0
lines changed

dataproc/snippets/README.md

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Cloud Dataproc API Examples
2+
3+
[![Open in Cloud Shell][shell_img]][shell_link]
4+
5+
[shell_img]: http://gstatic.com/cloudssh/images/open-btn.png
6+
[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=dataproc/README.md
7+
8+
Sample command-line programs for interacting with the Cloud Dataproc API.
9+
10+
See [the tutorial on the using the Dataproc API with the Python client
11+
library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example)
12+
for information on a walkthrough you can run to try out the Cloud Dataproc API sample code.
13+
14+
Note that while this sample demonstrates interacting with Dataproc via the API, the functionality demonstrated here could also be accomplished using the Cloud Console or the gcloud CLI.
15+
16+
`list_clusters.py` is a simple command-line program to demonstrate connecting to the Cloud Dataproc API and listing the clusters in a region.
17+
18+
`submit_job_to_cluster.py` demonstrates how to create a cluster, submit the
19+
`pyspark_sort.py` job, download the output from Google Cloud Storage, and output the result.
20+
21+
`single_job_workflow.py` uses the Cloud Dataproc InstantiateInlineWorkflowTemplate API to create an ephemeral cluster, run a job, then delete the cluster with one API request.
22+
23+
`pyspark_sort.py_gcs` is the same as `pyspark_sort.py` but demonstrates
24+
reading from a GCS bucket.
25+
26+
## Prerequisites to run locally:
27+
28+
* [pip](https://pypi.python.org/pypi/pip)
29+
30+
Go to the [Google Cloud Console](https://console.cloud.google.com).
31+
32+
Under API Manager, search for the Google Cloud Dataproc API and enable it.
33+
34+
## Set Up Your Local Dev Environment
35+
36+
To install, run the following commands. If you want to use [virtualenv](https://virtualenv.readthedocs.org/en/latest/)
37+
(recommended), run the commands within a virtualenv.
38+
39+
* pip install -r requirements.txt
40+
41+
## Authentication
42+
43+
Please see the [Google cloud authentication guide](https://cloud.google.com/docs/authentication/).
44+
The recommended approach to running these samples is a Service Account with a JSON key.
45+
46+
## Environment Variables
47+
48+
Set the following environment variables:
49+
50+
GOOGLE_CLOUD_PROJECT=your-project-id
51+
REGION=us-central1 # or your region
52+
CLUSTER_NAME=waprin-spark7
53+
ZONE=us-central1-b
54+
55+
## Running the samples
56+
57+
To run list_clusters.py:
58+
59+
python list_clusters.py $GOOGLE_CLOUD_PROJECT --region=$REGION
60+
61+
`submit_job_to_cluster.py` can create the Dataproc cluster or use an existing cluster. To create a cluster before running the code, you can use the [Cloud Console](console.cloud.google.com) or run:
62+
63+
gcloud dataproc clusters create your-cluster-name
64+
65+
To run submit_job_to_cluster.py, first create a GCS bucket (used by Cloud Dataproc to stage files) from the Cloud Console or with gsutil:
66+
67+
gsutil mb gs://<your-staging-bucket-name>
68+
69+
Next, set the following environment variables:
70+
71+
BUCKET=your-staging-bucket
72+
CLUSTER=your-cluster-name
73+
74+
Then, if you want to use an existing cluster, run:
75+
76+
python submit_job_to_cluster.py --project_id=$GOOGLE_CLOUD_PROJECT --zone=us-central1-b --cluster_name=$CLUSTER --gcs_bucket=$BUCKET
77+
78+
Alternatively, to create a new cluster, which will be deleted at the end of the job, run:
79+
80+
python submit_job_to_cluster.py --project_id=$GOOGLE_CLOUD_PROJECT --zone=us-central1-b --cluster_name=$CLUSTER --gcs_bucket=$BUCKET --create_new_cluster
81+
82+
The script will setup a cluster, upload the PySpark file, submit the job, print the result, then, if it created the cluster, delete the cluster.
83+
84+
Optionally, you can add the `--pyspark_file` argument to change from the default `pyspark_sort.py` included in this script to a new script.

dataproc/snippets/create_cluster.py

+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
#!/usr/bin/env python
2+
3+
# Copyright 2019 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
# This sample walks a user through creating a Cloud Dataproc cluster using
18+
# the Python client library.
19+
#
20+
# This script can be run on its own:
21+
# python create_cluster.py ${PROJECT_ID} ${REGION} ${CLUSTER_NAME}
22+
23+
24+
import sys
25+
26+
# [START dataproc_create_cluster]
27+
from google.cloud import dataproc_v1 as dataproc
28+
29+
30+
def create_cluster(project_id, region, cluster_name):
31+
"""This sample walks a user through creating a Cloud Dataproc cluster
32+
using the Python client library.
33+
34+
Args:
35+
project_id (string): Project to use for creating resources.
36+
region (string): Region where the resources should live.
37+
cluster_name (string): Name to use for creating a cluster.
38+
"""
39+
40+
# Create a client with the endpoint set to the desired cluster region.
41+
cluster_client = dataproc.ClusterControllerClient(client_options={
42+
'api_endpoint': f'{region}-dataproc.googleapis.com:443',
43+
})
44+
45+
# Create the cluster config.
46+
cluster = {
47+
'project_id': project_id,
48+
'cluster_name': cluster_name,
49+
'config': {
50+
'master_config': {
51+
'num_instances': 1,
52+
'machine_type_uri': 'n1-standard-1'
53+
},
54+
'worker_config': {
55+
'num_instances': 2,
56+
'machine_type_uri': 'n1-standard-1'
57+
}
58+
}
59+
}
60+
61+
# Create the cluster.
62+
operation = cluster_client.create_cluster(project_id, region, cluster)
63+
result = operation.result()
64+
65+
# Output a success message.
66+
print(f'Cluster created successfully: {result.cluster_name}')
67+
# [END dataproc_create_cluster]
68+
69+
70+
if __name__ == "__main__":
71+
if len(sys.argv) < 4:
72+
sys.exit('python create_cluster.py project_id region cluster_name')
73+
74+
project_id = sys.argv[1]
75+
region = sys.argv[2]
76+
cluster_name = sys.argv[3]
77+
create_cluster(project_id, region, cluster_name)
+47
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Copyright 2019 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import os
16+
import uuid
17+
18+
from google.cloud import dataproc_v1 as dataproc
19+
import pytest
20+
21+
import create_cluster
22+
23+
24+
PROJECT_ID = os.environ['GOOGLE_CLOUD_PROJECT']
25+
REGION = 'us-central1'
26+
CLUSTER_NAME = 'py-cc-test-{}'.format(str(uuid.uuid4()))
27+
28+
29+
@pytest.fixture(autouse=True)
30+
def teardown():
31+
yield
32+
33+
cluster_client = dataproc.ClusterControllerClient(client_options={
34+
'api_endpoint': f'{REGION}-dataproc.googleapis.com:443'
35+
})
36+
# Client library function
37+
operation = cluster_client.delete_cluster(PROJECT_ID, REGION, CLUSTER_NAME)
38+
# Wait for cluster to delete
39+
operation.result()
40+
41+
42+
def test_cluster_create(capsys):
43+
# Wrapper function for client library function
44+
create_cluster.create_cluster(PROJECT_ID, REGION, CLUSTER_NAME)
45+
46+
out, _ = capsys.readouterr()
47+
assert CLUSTER_NAME in out
+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
13+
""" Integration tests for Dataproc samples.
14+
15+
Creates a Dataproc cluster, uploads a pyspark file to Google Cloud Storage,
16+
submits a job to Dataproc that runs the pyspark file, then downloads
17+
the output logs from Cloud Storage and verifies the expected output."""
18+
19+
import os
20+
21+
import submit_job_to_cluster
22+
23+
PROJECT = os.environ['GOOGLE_CLOUD_PROJECT']
24+
BUCKET = os.environ['CLOUD_STORAGE_BUCKET']
25+
CLUSTER_NAME = 'testcluster3'
26+
ZONE = 'us-central1-b'
27+
28+
29+
def test_e2e():
30+
output = submit_job_to_cluster.main(
31+
PROJECT, ZONE, CLUSTER_NAME, BUCKET)
32+
assert b"['Hello,', 'dog', 'elephant', 'panther', 'world!']" in output
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# Copyright 2020 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# This sample walks a user through instantiating an inline
16+
# workflow for Cloud Dataproc using the Python client library.
17+
#
18+
# This script can be run on its own:
19+
# python instantiate_inline_workflow_template.py ${PROJECT_ID} ${REGION}
20+
21+
22+
import sys
23+
24+
# [START dataproc_instantiate_inline_workflow_template]
25+
from google.cloud import dataproc_v1 as dataproc
26+
27+
28+
def instantiate_inline_workflow_template(project_id, region):
29+
"""This sample walks a user through submitting a workflow
30+
for a Cloud Dataproc using the Python client library.
31+
32+
Args:
33+
project_id (string): Project to use for running the workflow.
34+
region (string): Region where the workflow resources should live.
35+
"""
36+
37+
# Create a client with the endpoint set to the desired region.
38+
workflow_template_client = dataproc.WorkflowTemplateServiceClient(
39+
client_options={
40+
'api_endpoint': f'{region}-dataproc.googleapis.com:443'
41+
}
42+
)
43+
44+
parent = workflow_template_client.region_path(project_id, region)
45+
46+
template = {
47+
'jobs': [
48+
{
49+
'hadoop_job': {
50+
'main_jar_file_uri': 'file:///usr/lib/hadoop-mapreduce/'
51+
'hadoop-mapreduce-examples.jar',
52+
'args': [
53+
'teragen',
54+
'1000',
55+
'hdfs:///gen/'
56+
]
57+
},
58+
'step_id': 'teragen'
59+
},
60+
{
61+
'hadoop_job': {
62+
'main_jar_file_uri': 'file:///usr/lib/hadoop-mapreduce/'
63+
'hadoop-mapreduce-examples.jar',
64+
'args': [
65+
'terasort',
66+
'hdfs:///gen/',
67+
'hdfs:///sort/'
68+
]
69+
},
70+
'step_id': 'terasort',
71+
'prerequisite_step_ids': [
72+
'teragen'
73+
]
74+
}],
75+
'placement': {
76+
'managed_cluster': {
77+
'cluster_name': 'my-managed-cluster',
78+
'config': {
79+
'gce_cluster_config': {
80+
# Leave 'zone_uri' empty for 'Auto Zone Placement'
81+
# 'zone_uri': ''
82+
'zone_uri': 'us-central1-a'
83+
}
84+
}
85+
}
86+
}
87+
}
88+
89+
# Submit the request to instantiate the workflow from an inline template.
90+
operation = workflow_template_client.instantiate_inline_workflow_template(
91+
parent, template
92+
)
93+
operation.result()
94+
95+
# Output a success message.
96+
print('Workflow ran successfully.')
97+
# [END dataproc_instantiate_inline_workflow_template]
98+
99+
100+
if __name__ == "__main__":
101+
if len(sys.argv) < 3:
102+
sys.exit('python instantiate_inline_workflow_template.py '
103+
+ 'project_id region')
104+
105+
project_id = sys.argv[1]
106+
region = sys.argv[2]
107+
instantiate_inline_workflow_template(project_id, region)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Copyright 2020 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import os
16+
17+
import instantiate_inline_workflow_template
18+
19+
20+
PROJECT_ID = os.environ['GOOGLE_CLOUD_PROJECT']
21+
REGION = 'us-central1'
22+
23+
24+
def test_workflows(capsys):
25+
# Wrapper function for client library function
26+
instantiate_inline_workflow_template.instantiate_inline_workflow_template(
27+
PROJECT_ID, REGION
28+
)
29+
30+
out, _ = capsys.readouterr()
31+
assert "successfully" in out

0 commit comments

Comments
 (0)