Skip to content

Fix: Normalize when_matched and merge_filter expressions to the source dialect #4847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 2, 2025

Conversation

erindru
Copy link
Collaborator

@erindru erindru commented Jun 30, 2025

Fixes #4843

Currently, neither the when_matched or merge_filter user-supplied expressions are normalized. This can lead to issues when queries are executed, because we quote by default.

So an expression in a Model definition like:

  when_matched (
      WHEN MATCHED THEN UPDATE SET target.key_a = source.key_a,
    ),

Would get executed as:

... WHEN MATCHED THEN UPDATE SET "__MERGE_TARGET__"."key_a" = "__MERGE_SOURCE__"."key_a" ...

which is incorrect for databases like Snowflake as unquoted identifiers should be normalized to uppercase like so:

... WHEN MATCHED THEN UPDATE SET "__MERGE_TARGET__"."KEY_A" = "__MERGE_SOURCE__"."KEY_A" ...

This PR normalizes the user input for the when_matched and merge_filter properties of IncrementalByUniqueKeyKind as well as normalizing the MERGE_SOURCE_ALIAS and MERGE_TARGET_ALIAS constants so they are consistent with the user query snippet

WHEN MATCHED AND __MERGE_SOURCE__._operation = 1 THEN DELETE
WHEN MATCHED AND __MERGE_SOURCE__._operation <> 1 THEN UPDATE SET
__MERGE_TARGET__.purchase_order_id = 1
WHEN MATCHED AND __merge_source__._operation = 1 THEN DELETE
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our tests tend to use dialects where a normalized identifier is lowercased, so fixing this issue required a bunch of edits to test SQL


return t.cast(exp.Whens, v.transform(d.replace_merge_table_aliases))
return normalize_identifiers(v, dialect=dialect)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it here, then wouldn't this require a migration? Since this changes how the expressions are stored in the state. Don't we need a migration script and to make this a breaking change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, that in other places (eg. partitioned_by, time column, etc) we quote identifiers in addition to normalizing them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to normalize, quote and attach .meta["dialect"] to the returned expressions so that when they get serialized to JSON for state, they are serialized according to the correct dialect.

I re-used the _get_field private function from sqlmesh.utils.pydantic which did all of this already but added a public function to expose it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added an empty migration

@erindru erindru force-pushed the erin/normalize-when-matched branch from 9c5f889 to 5c40e28 Compare June 30, 2025 23:37

model = SqlModel.parse_raw(model.json())
assert model.kind.when_matched.sql() == expected_when_matched
assert model.kind.when_matched.sql(dialect="hive") == expected_when_matched
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dialects started becoming very relevant for .sql() calls in many tests now that we quote identifiers

@@ -7894,7 +7953,7 @@ def test_merge_filter():
source.ds > (SELECT MAX(ds) FROM db.test) AND
source.ds > @start_ds AND
source._operation <> 1 AND
target.start_date > dateadd(day, -7, current_date)
target.start_date > date_add(current_date, interval 7 day)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dateadd(day, -7, current_date) isnt actually a duckdb function. Due to this, day was getting treated as a column "day" which is incorrect.

So I changed this to the correct syntax

@erindru erindru force-pushed the erin/normalize-when-matched branch from 5c40e28 to ed7b3cf Compare July 1, 2025 00:09
@erindru erindru merged commit 83137ff into main Jul 2, 2025
27 checks passed
@erindru erindru deleted the erin/normalize-when-matched branch July 2, 2025 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The "when_matched" expression of the INCREMENTAL_BY_UNIQUE key kind is not normalized
2 participants