Skip to content

Destination Snowflake: Revisiting merge instead of insert+delete #43367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 20, 2024

Conversation

gisripa
Copy link
Contributor

@gisripa gisripa commented Aug 7, 2024

What

Fixes: airbytehq/airbyte-internal-issues#9931
See the rabbit hole of linked tickets above
Resurrecting: #31683

Switching back to MERGE INTO ... WHEN... for CDC deletions. MERGE still expects unique records in source dataset(a.k.a table from CTE in our case) to have a deterministic behavior.

How

Review guide

Experimented a lot with Kotlin's idioms to deduplicate query string snippets.

Also consolidated the row_number window function to be executed on the preceding CTE (saw few millis impact when testing in console but not a great representation by individual runs)

User Impact

Users have the ability to opt-in to use MERGE.
This is intentionally left as non-default for now because of previous history of rollback.
merge-option-UI

Other details

Tested syncs with both insert+delete and merge using the toggle option. No improvement in Sync time per se but no indication that warehouse costs would have increased if time spent in query is a representation of the warehouse credits used.

Query Profiles

I did not find any evidence that MERGE could increase warehouse activity if Time spent by query is a representation of warehouse credits.

Final Table seeded with 100M rows.
Raw Table generated with 500K data with some random percentage updated and cdc-deleted

Profiles for Insert+Delete+Delete

Insert

Insert-500K

Delete from final table duplicated PKs

Delete-864

CDC Deletes

CdcDelete-150K

Profile for Merge

Merge-245K-104K-45K

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

Copy link

vercel bot commented Aug 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 20, 2024 11:19pm

Copy link
Contributor Author

gisripa commented Aug 7, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @gisripa and the rest of your teammates on Graphite Graphite

@gisripa gisripa changed the title snowflake-merge-stmt Destination Snowflake: Revisiting merge instead of insert+delete Aug 7, 2024
Base automatically changed from gireesh/08-06-snowflake-sqlgen-cleanup to master August 7, 2024 19:10
@gisripa gisripa force-pushed the gireesh/08-07-snowflake-merge-stmt branch 5 times, most recently from 2b2e786 to 7755aaa Compare August 9, 2024 05:12
@gisripa gisripa marked this pull request as ready for review August 9, 2024 05:22
@gisripa gisripa requested a review from a team as a code owner August 9, 2024 05:22
@gisripa gisripa force-pushed the gireesh/08-07-snowflake-merge-stmt branch from 7755aaa to 329ee6b Compare August 9, 2024 15:50
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ngl I was only skimming the diff by the end :P largely trusting our tests to keep us safe here

+I assume you're planning to run some tests in the perf test workspace to try and validate the performance impact? I'm actually very curious what happens there - faker reemitting the whole dataset on every sync is basically the worst case, and I have no idea how merge vs insert+delete will perform there

changes seem reasonable at a glance. had some nitpicks but nothing structural. LGTM pending testing.

JavaBaseConstants.COLUMN_NAME_AB_EXTRACTED_AT.uppercase().quoted(),
JavaBaseConstants.COLUMN_NAME_AB_GENERATION_ID.uppercase().quoted()
)
val abMetaColumn = JavaBaseConstants.COLUMN_NAME_AB_META.uppercase().quoted()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not include this in columnsForUpdateList?

|${selectTypedRecordsFromRawTable(stream, minRawTimestamp, safeCast).replaceIndent(" ")}
|) $sourceTable
|ON
|${pkEqualityMatch.replaceIndent(" ")} $whenMatchedCdcDeleteCondition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not just put whenMatchedCdcDeleteCondition on its own line + remove the \n from it? Right now it kind of looks like whenMatchedCdcDeleteCondition is part of the ON clause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there was an empty line when this condition was missing. wasn't sure if sql compilation failed for this or something else so was trying different things

@gisripa gisripa force-pushed the gireesh/08-07-snowflake-merge-stmt branch 6 times, most recently from 48dba70 to 0fa45a9 Compare August 15, 2024 19:56
@gisripa gisripa force-pushed the gireesh/08-07-snowflake-merge-stmt branch 2 times, most recently from bd085c3 to 1aadeb0 Compare August 20, 2024 03:23
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Aug 20, 2024
Copy link
Contributor

@edgao edgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending switching to an abstract base class for the test

@gisripa gisripa force-pushed the gireesh/08-07-snowflake-merge-stmt branch from 1aadeb0 to ff96664 Compare August 20, 2024 23:10
@gisripa gisripa force-pushed the gireesh/08-07-snowflake-merge-stmt branch from ff96664 to c96b2cb Compare August 20, 2024 23:12
@gisripa gisripa enabled auto-merge (squash) August 20, 2024 23:30
@gisripa gisripa merged commit 141daac into master Aug 20, 2024
34 checks passed
@gisripa gisripa deleted the gireesh/08-07-snowflake-merge-stmt branch August 20, 2024 23:34
@joeybenamy
Copy link

@gisripa This PR is a huge deal for our business. Wanted to say thank you for opening this PR.

@edgao
Copy link
Contributor

edgao commented Aug 21, 2024

just a heads up, we had to rollback to an older version of destination-snowflake for unrelated reasons. keep an eye on #44526, which will also rerelease this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/snowflake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants