🎉 Snowflake destination: reduce memory footprint #10394

tuliren · 2022-02-17T04:13:45Z

Estimate record message size by string length, instead of byte length. This significantly reduce the amount of memory usage.
To further reduce redundant string serialization, the size estimation only sample the string size very N records, instead of doing it for every record.
See rationales in [EPIC] scale warehouse destination connectors to handle arbitrary number of streams #10260.

subodh1810

Am not sure I understand how this will reduce heap consumption. The only difference I see is that with previous logic we were calculating the size for each record using s.getBytes(StandardCharsets.UTF_8).length; but with new logic we do it via Jsons.serialize(data).length() * 4L but for every 20th record. How does this help with lesser heap consumption?

tuliren · 2022-02-17T11:15:05Z

@subodh1810, calculating the bytes of the serialized json strings creates lots of byte array objects. So switching away from generating the byte array is the fix. Before this change, the connector will run out of memory when the max heap size is 500 MB. After this change, it works even with just 300 MB heap size.

See here for raw data.

tuliren · 2022-02-17T18:53:02Z

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1860490882
❌ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1860490882

tuliren · 2022-02-17T19:22:54Z

/publish connector=connectors/destination-snowflake

🕑 connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1860626168
✅ connectors/destination-snowflake https://github.com/airbytehq/airbyte/actions/runs/1860626168

tuliren added 6 commits February 16, 2022 12:19

Add detailed logging for flushing

87f6596

Log sentry transaction event id

5139b33

Adjust logging

4af2bb2

Log memory usage

be884a4

Add jvm monitoring

1acb7ab

Remove log

46e4690

github-actions bot added area/connectors Connector related issues area/platform issues related to the platform area/worker Related to worker labels Feb 17, 2022

tuliren temporarily deployed to more-secrets February 17, 2022 04:14 Inactive

Remove port 9010

6e4e201

tuliren temporarily deployed to more-secrets February 17, 2022 04:17 Inactive

Remove host network mode

925c91c

tuliren temporarily deployed to more-secrets February 17, 2022 05:30 Inactive

Sample record size

df0b605

tuliren changed the title ~~Investigate snowflake destination memory usage~~ 🎉 Snowflake destination: reduce memory footprint Feb 17, 2022

Remove profiling code

097ec57

tuliren temporarily deployed to more-secrets February 17, 2022 07:44 Inactive

Add unit tests

9f7a73d

tuliren temporarily deployed to more-secrets February 17, 2022 08:50 Inactive

tuliren added 3 commits February 17, 2022 00:59

Use average estimation

907e684

Rename variable

a0bd988

Format code

e65f8f5

tuliren temporarily deployed to more-secrets February 17, 2022 09:03 Inactive

tuliren requested a review from subodh1810 February 17, 2022 09:04

github-actions bot added the area/documentation Improvements or additions to documentation label Feb 17, 2022

tuliren temporarily deployed to more-secrets February 17, 2022 09:05 Inactive

octavia-squidington-iii temporarily deployed to more-secrets February 17, 2022 09:06 Inactive

Revert unnecessary change

51d0a4f

tuliren temporarily deployed to more-secrets February 17, 2022 09:35 Inactive

tuliren temporarily deployed to more-secrets February 17, 2022 09:36 Inactive

subodh1810 reviewed Feb 17, 2022

View reviewed changes

subodh1810 approved these changes Feb 17, 2022

View reviewed changes

edgao approved these changes Feb 17, 2022

View reviewed changes

octavia-squidington-iii temporarily deployed to more-secrets February 17, 2022 18:55 Inactive

tuliren added 2 commits February 17, 2022 11:21

Merge branch 'master' into liren/snowflake-memory-investigation

1fc1ca2

Update doc

a46b546

tuliren temporarily deployed to more-secrets February 17, 2022 19:24 Inactive

octavia-squidington-iii temporarily deployed to more-secrets February 17, 2022 19:25 Inactive

Fix format

9bed378

github-actions bot added the CDK Connector Development Kit label Feb 17, 2022

tuliren temporarily deployed to more-secrets February 17, 2022 19:59 Inactive

Bump version in seed

524f7a3

tuliren merged commit 049a11b into master Feb 17, 2022

tuliren deleted the liren/snowflake-memory-investigation branch February 17, 2022 20:55

tuliren temporarily deployed to more-secrets February 17, 2022 20:56 Inactive

octavia-squidington-iii mentioned this pull request Feb 17, 2022

Bump Airbyte version from 0.35.30-alpha to 0.35.31-alpha #10446

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎉 Snowflake destination: reduce memory footprint #10394

🎉 Snowflake destination: reduce memory footprint #10394

Uh oh!

tuliren commented Feb 17, 2022 •

edited

Loading

Uh oh!

subodh1810 left a comment

Uh oh!

tuliren commented Feb 17, 2022 •

edited

Loading

Uh oh!

tuliren commented Feb 17, 2022 •

edited by github-actions bot

Loading

Uh oh!

tuliren commented Feb 17, 2022 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

🎉 Snowflake destination: reduce memory footprint #10394

🎉 Snowflake destination: reduce memory footprint #10394

Uh oh!

Conversation

tuliren commented Feb 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

subodh1810 left a comment

Choose a reason for hiding this comment

Uh oh!

tuliren commented Feb 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tuliren commented Feb 17, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tuliren commented Feb 17, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tuliren commented Feb 17, 2022 •

edited

Loading

tuliren commented Feb 17, 2022 •

edited

Loading

tuliren commented Feb 17, 2022 •

edited by github-actions bot

Loading

tuliren commented Feb 17, 2022 •

edited by github-actions bot

Loading