Skip to content

initial base module implementation for Dataplex AutoDQ #1543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

Conversation

thinhha
Copy link
Contributor

@thinhha thinhha commented Jul 28, 2023

@google-cla
Copy link

google-cla bot commented Jul 28, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@thinhha thinhha requested review from ludoo and saurabh-net July 28, 2023 20:54
@thinhha thinhha marked this pull request as ready for review July 28, 2023 20:54
*/

variable "data" {
description = "The data source for DataScan. The input variable 'data' should be an object containing a single key-value pair that can be one of: * `entity`: The Dataplex entity that represents the data source (e.g. BigQuery table) for DataScan, of the form: `projects/{project_number}/locations/{locationId}/lakes/{lakeId}/zones/{zoneId}/entities/{entityId}`. * `resource`: The service-qualified full resource name of the cloud resource for a DataScan job to scan against. The field could be: BigQuery table of type 'TABLE' for DataProfileScan/DataQualityScan format, e.g: `//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET_ID/tables/TABLE_ID`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable descriptions are usually one-liners :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha ok :) will shorten

variable "execution_trigger" {
description = "The input variable 'execution_trigger' specifies when a scan should be triggered. If not specified, the default is `on_demand`, which means the scan will not run until the user calls `dataScans.run` API. Alternatively, you can set `execution_trigger` to `scheduled` scheduled to run periodically based on a cron schedule expression."
type = object({
on_demand = optional(object({}))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could also be done differently: the variable is execution_schedule with string type, and defaults to null which implies on demand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a good point. I didn't realize the default behaviour for the official API was on_demand until after I've done everything else. I will refactor this according to your suggestion

Copy link
Collaborator

@ludoo ludoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple minor comments, feel free to resolve them once done (and feel free to ignore the variable name/type change). When you are ready just merge.

Be careful in the future when pushing commits with the wrong email, had to rewrite commit history and force approve the CLA check. :)

@@ -0,0 +1,273 @@
# Dataplex AutoDQ
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please link the module from the top-level repo README, and the modules folder README?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could you rename the module as cloud-dataplex-autodq, or rename the existing cloud-dataplex as dataplex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official name is Dataplex not Cloud Dataplex. I will rename the other module accordingly.

@thinhha
Copy link
Contributor Author

thinhha commented Jul 29, 2023

Thanks Ludo for your feedback. I think I've addressed all of your feedback.

Thank you for also fixing my CLA checks. I noticed I'm using the wrong email address in the first commit :/ sorry but I think I will need to do another git push --force.

@github-actions github-actions bot added the on:tools New or changed tool label Jul 29, 2023
@thinhha
Copy link
Contributor Author

thinhha commented Jul 29, 2023

This is getting messy. I'm going to close this PR and reopen with a different branch.

@thinhha thinhha closed this Jul 29, 2023
@thinhha thinhha deleted the autodq branch July 29, 2023 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants