Skip to content

Use 3-level namespace when catalog is set. #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 9, 2022
Merged

Conversation

ueshin
Copy link
Collaborator

@ueshin ueshin commented May 12, 2022

Description

Uses 3-level namespace when catalog is set.

@ueshin ueshin changed the title first try at catalog support [WIP] first try at catalog support May 12, 2022
@ewengillies
Copy link

thrilled to see work in this direction, I was just looking for this feature 3 days ago and happy to see its already in a PR. Is there anything I can do to help? Thanks :)

@ueshin ueshin changed the title [WIP] first try at catalog support Use 3-level namespace when setting catalog May 21, 2022
@ueshin ueshin marked this pull request as ready for review May 21, 2022 01:15
@ueshin ueshin changed the title Use 3-level namespace when setting catalog Use 3-level namespace when catalog is set. May 21, 2022
@ueshin ueshin marked this pull request as draft May 23, 2022 17:45
@ueshin
Copy link
Collaborator Author

ueshin commented May 23, 2022

Let me separate some changes to another PR.

@ueshin
Copy link
Collaborator Author

ueshin commented May 23, 2022

Now this is based on #98.

@ueshin ueshin marked this pull request as ready for review May 26, 2022 00:03
@ueshin ueshin mentioned this pull request May 26, 2022
@@ -67,3 +80,113 @@ def execute(
finally:
if staging_table is not None:
self.drop_relation(staging_table)

def list_relations_without_caching(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a new method in dbt-spark that returns the catalog field to avoid duplicate code in the future (but we need to know the release schedule for dbt-spark).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can ask dbt-spark to make the change.
Let's have them here and work on it in a separate PR for now.

{% set tmp_relation = base_relation.incorporate(path = {
"identifier": tmp_identifier,
"schema": None,
"database": None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't specify "database": None here, what's the default value for it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tmp_relation will hold the database of base_relation, that results with database.identifier for the temp relation name if base_relation has database field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Thanks for the explanation!

@@ -58,7 +73,7 @@

{{ adapter.valid_snapshot_target(target_relation) }}

{% set staging_table = spark_build_snapshot_staging_table(strategy, sql, target_relation) %}
{% set staging_table = databricks_build_snapshot_staging_table(strategy, sql, target_relation) %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not urgent but should we use the dispatch pattern here to make it backward compatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point.
Actually I think if users override spark_build_snapshot_staging_table macro, it will highly likely break the catalog support.
How about using spark_build_snapshot_staging_table only when target_relation.database is None?
Also maybe we should mention it in the change log and the release note.

# Insert a row into the seed model with an invalid id.
self.run_sql_file("insert_invalid_id.sql")
self.run_and_check_failure(
model_name,
err_msg="CHECK constraint id_greater_than_zero",
)
self.check_staging_table_cleaned()
self.run_sql(f"delete from {schema}.seed where id = 0")
self.run_sql("delete from {database_schema}.seed where id = 0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so if we use {database_schema} in the query then dbt will automatically change it to use the target database and target schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I made some changes in tests/integration/base.py to handle database_schema.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it's in transform_sql

@ueshin ueshin requested a review from allisonwang-db June 8, 2022 18:05
{% set tmp_relation = base_relation.incorporate(path = {
"identifier": tmp_identifier,
"schema": None,
"database": None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Thanks for the explanation!

# Insert a row into the seed model with an invalid id.
self.run_sql_file("insert_invalid_id.sql")
self.run_and_check_failure(
model_name,
err_msg="CHECK constraint id_greater_than_zero",
)
self.check_staging_table_cleaned()
self.run_sql(f"delete from {schema}.seed where id = 0")
self.run_sql("delete from {database_schema}.seed where id = 0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it's in transform_sql

@ueshin
Copy link
Collaborator Author

ueshin commented Jun 9, 2022

Thanks! merging.

@ueshin ueshin merged commit 6f462d4 into databricks:main Jun 9, 2022
@ueshin ueshin deleted the catalog branch June 9, 2022 19:35
ueshin added a commit to ueshin/dbt-databricks that referenced this pull request Jun 10, 2022
### Description

Uses 3-level namespace when catalog is set.
ueshin added a commit that referenced this pull request Jun 15, 2022
### Description

Uses 3-level namespace when catalog is set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants