-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Snowflake: Handling large records #45139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
private val parsedCatalog: ParsedCatalog?, | ||
private val defaultNamespace: String | ||
) : StreamAwareDataTransformer { | ||
private data class ScalarNodeModification( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems extremely complex for what is essentially a column-nullifier
): StreamAwareDataTransformer { | ||
// Redundant override to keep in consistent with InsertDestination. TODO: Unify these 2 | ||
// classes with | ||
// composition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment here means.
val startTime = System.currentTimeMillis() | ||
|
||
log.debug{"Traversing the record to NULL fields for snowflake size limitations"} | ||
//println("Traversing the record to NULL fields for snowflake size limitations") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove all commented out TODOs and println statements.
// const val SNOWFLAKE_VARCHAR_MAX_BYTE_SIZE: Int = 16 * 1024 * 1024 | ||
// const val SNOWFLAKE_SUPER_MAX_BYTE_SIZE: Int = 16 * 1024 * 1024 | ||
|
||
val DEFAULT_PREDICATE_VARCHAR_GREATER_THAN_64K: Predicate<String> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename it appropriately to the predicate it is testing for.
What
Implementing the handling of large records in the snowflake destination connector to fix the issue mentioned in this ticket:
https://github.com/airbytehq/oncall/issues/6245
When the record contains large values and exceeds the max allowed record size of 16MB on Snowflake, this change would gracefully remove large values by setting those columns to null until the record is below the size limit. The changes on the record are updated to indicate that the specific columns were set to null due to data size limits.
How
Removing large values from the record using these steps:
Review guide
Please review the code to confirm the removal of null values is being done as per the requirements mentioned above.
User Impact
Large values from records will be removed so users may see null values for some fields in the destination records.
Can this PR be safely reverted and rolled back?