S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer #43714

johnny-schmidt · 2024-08-11T17:39:56Z

Some configurations OOM consistently in the cloud. I can't find evidence this is affecting customers, but bumping memory to 2G (same as other async destinations) seems to enable it to succeed.

There are three changes here:

increase memory requirements to 2G
reduce the memory ratio to 0.5
additional changes to make it possible to tweak memory ratio and concurrency entirely with a connector release (no CDK)

I built a dev connector off this, and it allows even the most demanding loads to make steady progress.

vercel · 2024-08-11T17:40:00Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Aug 13, 2024 0:34am

johnny-schmidt · 2024-08-11T20:10:52Z

Successful sync with this config: 5e7102b0b1e7e19b14650c8b0c6b976f95687079!

evantahler

Ideally, the way that connectors work is that they work in lower-memory environments, but they go slower (e.g. smaller batches or less parallelism). This is good both for our cloud costs, and for self-hosted users that don't have large K8s clusters.

As most syncs seem to work with the 1gb setting, I'd suggest that we delay rolling this out for all cloud syncs unless there really is no other option. E.g. Maybe the parallelism for the S3 destination needs to be less, or the % of memory we allocate to an async worker should be increased?

[edit] somehow I was commenting on some phantom version of this PR that didn't also include the thread / memory % changes. Are just those enough, or do we also need the 2gb bump for sync to succeed?

evantahler · 2024-08-11T21:09:34Z

...destinations/src/main/kotlin/io/airbyte/cdk/integrations/destination/s3/BaseS3Destination.kt

-    protected val environment: Map<String, String> = System.getenv()
+    protected val environment: Map<String, String> = System.getenv(),
+    private val memoryRatio: Double = 0.5,
+    private val nThreads: Int = 5


Should this be some function of the number of CPUs that are allocated to the container?

Probably, but it's hardcoded to five currently. Also according to the resource runbook, the syncs only have 1 core by default? I assumed the point was to parallelize workloads that we assumed would be network bound (but because we're staging everything to a local file before syncing, that's probably less true?)

johnny-schmidt · 2024-08-12T18:54:29Z

@evantahler I reverted the memory increase request. Now this change

reduces the memory ratio from 0.7 to 0.5
reduces the number of worker threads from 5 to 1

I will confirm this works for all configurations against the benchmark w/o significant perf reduction.

gisripa

No blocking comments,
Not sure if the having a default of Runtime.availableProcessors or something is sane or even that is not good for Parquet.

johnny-schmidt · 2024-08-12T20:54:47Z

No blocking comments, Not sure if the having a default of Runtime.availableProcessors or something is sane or even that is not good for Parquet.

@gisripa judging by the orchestrator logs, that's going to be 1 anyway

…Consumer (airbytehq#43714)

johnny-schmidt requested a review from a team as a code owner August 11, 2024 17:39

octavia-squidington-iii added area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/s3 labels Aug 11, 2024

johnny-schmidt requested a review from a team as a code owner August 11, 2024 18:03

octavia-squidington-iii added the CDK Connector Development Kit label Aug 11, 2024

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from 57bce43 to 9e638f7 Compare August 11, 2024 18:04

johnny-schmidt requested a review from gisripa August 11, 2024 19:12

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from 9e638f7 to 1310452 Compare August 11, 2024 19:15

vercel bot deployed to Preview August 11, 2024 19:20 View deployment

evantahler reviewed Aug 11, 2024

View reviewed changes

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from 1310452 to ae8aeb5 Compare August 11, 2024 23:00

vercel bot deployed to Preview August 11, 2024 23:05 View deployment

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from ae8aeb5 to 4d99c05 Compare August 12, 2024 18:51

vercel bot deployed to Preview August 12, 2024 18:56 View deployment

gisripa approved these changes Aug 12, 2024

View reviewed changes

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from ca03764 to 4d99c05 Compare August 12, 2024 22:01

johnny-schmidt changed the title ~~S3 Destination: Increased Memory for AsyncConsumer~~ S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer Aug 12, 2024

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from 4d99c05 to d51d428 Compare August 12, 2024 22:04

Increased memory for Async S3 Destination

add19e9

johnny-schmidt force-pushed the issue-43713/s3-destination-oom branch from d51d428 to add19e9 Compare August 12, 2024 22:08

vercel bot deployed to Preview August 12, 2024 22:13 View deployment

temporary: attemping with 2 threads again

d3bf483

johnny-schmidt merged commit 4c4a105 into master Aug 13, 2024
37 of 40 checks passed

johnny-schmidt deleted the issue-43713/s3-destination-oom branch August 13, 2024 15:24

LouisAuneau pushed a commit to LouisAuneau/airbyte that referenced this pull request Aug 13, 2024

S3 Destination: Descreased thread allocation & memory ratio for Async…

c349651

…Consumer (airbytehq#43714)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer #43714

S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer #43714

Uh oh!

johnny-schmidt commented Aug 11, 2024 •

edited

Loading

Uh oh!

vercel bot commented Aug 11, 2024 •

edited

Loading

Uh oh!

johnny-schmidt commented Aug 11, 2024

Uh oh!

evantahler left a comment •

edited

Loading

Uh oh!

evantahler Aug 11, 2024

Uh oh!

johnny-schmidt Aug 12, 2024

Uh oh!

johnny-schmidt commented Aug 12, 2024

Uh oh!

gisripa left a comment •

edited

Loading

Uh oh!

johnny-schmidt commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer #43714

S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer #43714

Uh oh!

Conversation

johnny-schmidt commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnny-schmidt commented Aug 11, 2024

Uh oh!

evantahler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evantahler Aug 11, 2024

Choose a reason for hiding this comment

Uh oh!

johnny-schmidt Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

johnny-schmidt commented Aug 12, 2024

Uh oh!

gisripa left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnny-schmidt commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

johnny-schmidt commented Aug 11, 2024 •

edited

Loading

vercel bot commented Aug 11, 2024 •

edited

Loading

evantahler left a comment •

edited

Loading

gisripa left a comment •

edited

Loading