Skip to content

Snowflake: Handling large records #45139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 94 commits into
base: master
Choose a base branch
from

Conversation

Vee7574
Copy link
Contributor

@Vee7574 Vee7574 commented Sep 4, 2024

What

Implementing the handling of large records in the snowflake destination connector to fix the issue mentioned in this ticket:
https://github.com/airbytehq/oncall/issues/6245

When the record contains large values and exceeds the max allowed record size of 16MB on Snowflake, this change would gracefully remove large values by setting those columns to null until the record is below the size limit. The changes on the record are updated to indicate that the specific columns were set to null due to data size limits.

How

Removing large values from the record using these steps:

  • Snowflake has a max size (16 MB) for the content we can fit into a VARIANT column, which we use to load all the record’s into the raw tables
  • If we exceed this size, we need to remove (null out) properties until the record fits. There’s an algorithm in destination-redshift for this we can port over.
  • We should never null out the PK, CDC delete columns, or cursors

Review guide

Please review the code to confirm the removal of null values is being done as per the requirements mentioned above.

User Impact

Large values from records will be removed so users may see null values for some fields in the destination records.

Can this PR be safely reverted and rolled back?

  • [ X] YES 💚
  • NO ❌

Vee7574 added 30 commits August 2, 2024 12:16
@Vee7574 Vee7574 requested review from a team as code owners September 4, 2024 19:52
Copy link

vercel bot commented Sep 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Sep 6, 2024 8:24pm

@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues CDK Connector Development Kit connectors/destination/snowflake labels Sep 4, 2024
@octavia-squidington-iii octavia-squidington-iii removed the CDK Connector Development Kit label Sep 5, 2024
private val parsedCatalog: ParsedCatalog?,
private val defaultNamespace: String
) : StreamAwareDataTransformer {
private data class ScalarNodeModification(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems extremely complex for what is essentially a column-nullifier

): StreamAwareDataTransformer {
// Redundant override to keep in consistent with InsertDestination. TODO: Unify these 2
// classes with
// composition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment here means.

val startTime = System.currentTimeMillis()

log.debug{"Traversing the record to NULL fields for snowflake size limitations"}
//println("Traversing the record to NULL fields for snowflake size limitations")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all commented out TODOs and println statements.

// const val SNOWFLAKE_VARCHAR_MAX_BYTE_SIZE: Int = 16 * 1024 * 1024
// const val SNOWFLAKE_SUPER_MAX_BYTE_SIZE: Int = 16 * 1024 * 1024

val DEFAULT_PREDICATE_VARCHAR_GREATER_THAN_64K: Predicate<String> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it appropriately to the predicate it is testing for.

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit connectors/destination/snowflake
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants