-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 Destination: Descreased thread allocation & memory ratio for AsyncConsumer #43714
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
57bce43
to
9e638f7
Compare
9e638f7
to
1310452
Compare
Successful sync with this config: 5e7102b0b1e7e19b14650c8b0c6b976f95687079! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the way that connectors work is that they work in lower-memory environments, but they go slower (e.g. smaller batches or less parallelism). This is good both for our cloud costs, and for self-hosted users that don't have large K8s clusters.
As most syncs seem to work with the 1gb setting, I'd suggest that we delay rolling this out for all cloud syncs unless there really is no other option. E.g. Maybe the parallelism for the S3 destination needs to be less, or the % of memory we allocate to an async worker should be increased?
[edit] somehow I was commenting on some phantom version of this PR that didn't also include the thread / memory % changes. Are just those enough, or do we also need the 2gb bump for sync to succeed?
protected val environment: Map<String, String> = System.getenv() | ||
protected val environment: Map<String, String> = System.getenv(), | ||
private val memoryRatio: Double = 0.5, | ||
private val nThreads: Int = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be some function of the number of CPUs that are allocated to the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but it's hardcoded to five currently. Also according to the resource runbook, the syncs only have 1 core by default? I assumed the point was to parallelize workloads that we assumed would be network bound (but because we're staging everything to a local file before syncing, that's probably less true?)
1310452
to
ae8aeb5
Compare
ae8aeb5
to
4d99c05
Compare
@evantahler I reverted the memory increase request. Now this change
I will confirm this works for all configurations against the benchmark w/o significant perf reduction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No blocking comments,
Not sure if the having a default of Runtime.availableProcessors
or something is sane or even that is not good for Parquet.
@gisripa judging by the orchestrator logs, that's going to be 1 anyway |
ca03764
to
4d99c05
Compare
4d99c05
to
d51d428
Compare
d51d428
to
add19e9
Compare
This issue
Some configurations OOM consistently in the cloud. I can't find evidence this is affecting customers, but bumping memory to 2G (same as other async destinations) seems to enable it to succeed.
There are three changes here:
I built a dev connector off this, and it allows even the most demanding loads to make steady progress.