-
Notifications
You must be signed in to change notification settings - Fork 4.5k
🎉 Snowflake destination: reduce memory footprint #10394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tuliren
commented
Feb 17, 2022
•
edited
Loading
edited
- Estimate record message size by string length, instead of byte length. This significantly reduce the amount of memory usage.
- To further reduce redundant string serialization, the size estimation only sample the string size very N records, instead of doing it for every record.
- See rationales in [EPIC] scale warehouse destination connectors to handle arbitrary number of streams #10260.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am not sure I understand how this will reduce heap consumption. The only difference I see is that with previous logic we were calculating the size for each record using s.getBytes(StandardCharsets.UTF_8).length;
but with new logic we do it via Jsons.serialize(data).length() * 4L
but for every 20th record. How does this help with lesser heap consumption?
@subodh1810, calculating the bytes of the serialized json strings creates lots of byte array objects. So switching away from generating the byte array is the fix. Before this change, the connector will run out of memory when the max heap size is 500 MB. After this change, it works even with just 300 MB heap size. See here for raw data. |
/publish connector=connectors/destination-snowflake
|
/publish connector=connectors/destination-snowflake
|