Fix: Normalize when_matched and merge_filter expressions to the source dialect #4847

erindru · 2025-06-30T07:38:31Z

Currently, neither the when_matched or merge_filter user-supplied expressions are normalized. This can lead to issues when queries are executed, because we quote by default.

So an expression in a Model definition like:

  when_matched (
      WHEN MATCHED THEN UPDATE SET target.key_a = source.key_a,
    ),

Would get executed as:

... WHEN MATCHED THEN UPDATE SET "__MERGE_TARGET__"."key_a" = "__MERGE_SOURCE__"."key_a" ...

which is incorrect for databases like Snowflake as unquoted identifiers should be normalized to uppercase like so:

... WHEN MATCHED THEN UPDATE SET "__MERGE_TARGET__"."KEY_A" = "__MERGE_SOURCE__"."KEY_A" ...

This PR normalizes the user input for the when_matched and merge_filter properties of IncrementalByUniqueKeyKind as well as normalizing the MERGE_SOURCE_ALIAS and MERGE_TARGET_ALIAS constants so they are consistent with the user query snippet

erindru · 2025-06-30T07:39:45Z

tests/core/test_model.py

-      WHEN MATCHED AND __MERGE_SOURCE__._operation = 1 THEN DELETE
-      WHEN MATCHED AND __MERGE_SOURCE__._operation <> 1 THEN UPDATE SET
-        __MERGE_TARGET__.purchase_order_id = 1
+      WHEN MATCHED AND __merge_source__._operation = 1 THEN DELETE


Our tests tend to use dialects where a normalized identifier is lowercased, so fixing this issue required a bunch of edits to test SQL

izeigerman · 2025-06-30T16:05:17Z

sqlmesh/core/model/kind.py


-        return t.cast(exp.Whens, v.transform(d.replace_merge_table_aliases))
+        return normalize_identifiers(v, dialect=dialect)


If we do it here, then wouldn't this require a migration? Since this changes how the expressions are stored in the state. Don't we need a migration script and to make this a breaking change?

Also note, that in other places (eg. partitioned_by, time column, etc) we quote identifiers in addition to normalizing them.

I've updated this to normalize, quote and attach .meta["dialect"] to the returned expressions so that when they get serialized to JSON for state, they are serialized according to the correct dialect.

I re-used the _get_field private function from sqlmesh.utils.pydantic which did all of this already but added a public function to expose it

I also added an empty migration

sqlmesh/core/model/kind.py

erindru · 2025-06-30T23:41:34Z

tests/core/test_model.py


    model = SqlModel.parse_raw(model.json())
-    assert model.kind.when_matched.sql() == expected_when_matched
+    assert model.kind.when_matched.sql(dialect="hive") == expected_when_matched


Dialects started becoming very relevant for .sql() calls in many tests now that we quote identifiers

erindru · 2025-06-30T23:43:09Z

tests/core/test_model.py

@@ -7894,7 +7953,7 @@ def test_merge_filter():
                source.ds > (SELECT MAX(ds) FROM db.test) AND
                source.ds > @start_ds AND
                source._operation <> 1 AND
-                target.start_date > dateadd(day, -7, current_date)
+                target.start_date > date_add(current_date, interval 7 day)


dateadd(day, -7, current_date) isnt actually a duckdb function. Due to this, day was getting treated as a column "day" which is incorrect.

So I changed this to the correct syntax

…e dialect

erindru commented Jun 30, 2025

View reviewed changes

izeigerman reviewed Jun 30, 2025

View reviewed changes

sqlmesh/core/model/kind.py Show resolved Hide resolved

erindru force-pushed the erin/normalize-when-matched branch from 9c5f889 to 5c40e28 Compare June 30, 2025 23:37

erindru commented Jun 30, 2025

View reviewed changes

Fix: Normalize when_matched and merge_filter expressions to the sourc…

ed7b3cf

…e dialect

erindru force-pushed the erin/normalize-when-matched branch from 5c40e28 to ed7b3cf Compare July 1, 2025 00:09

izeigerman approved these changes Jul 2, 2025

View reviewed changes

erindru merged commit 83137ff into main Jul 2, 2025
27 checks passed

erindru deleted the erin/normalize-when-matched branch July 2, 2025 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Normalize when_matched and merge_filter expressions to the source dialect #4847

Fix: Normalize when_matched and merge_filter expressions to the source dialect #4847

Uh oh!

erindru commented Jun 30, 2025

Uh oh!

erindru Jun 30, 2025

Uh oh!

izeigerman Jun 30, 2025

Uh oh!

izeigerman Jun 30, 2025

Uh oh!

erindru Jun 30, 2025

Uh oh!

erindru Jun 30, 2025

Uh oh!

Uh oh!

erindru Jun 30, 2025

Uh oh!

erindru Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!


		return t.cast(exp.Whens, v.transform(d.replace_merge_table_aliases))
		return normalize_identifiers(v, dialect=dialect)

Fix: Normalize when_matched and merge_filter expressions to the source dialect #4847

Fix: Normalize when_matched and merge_filter expressions to the source dialect #4847

Uh oh!

Conversation

erindru commented Jun 30, 2025

Uh oh!

erindru Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

izeigerman Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

izeigerman Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

erindru Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

erindru Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erindru Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

erindru Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!