Skip to content

Support sink_parquet/sink_ndjson/sink_csv with GPU engine #20259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wence- opened this issue Dec 11, 2024 · 0 comments
Open

Support sink_parquet/sink_ndjson/sink_csv with GPU engine #20259

wence- opened this issue Dec 11, 2024 · 0 comments
Assignees
Labels
A-gpu Area: gpu engine enhancement New feature or an improvement of an existing feature

Comments

@wence-
Copy link
Collaborator

wence- commented Dec 11, 2024

Description

When collecting a lazy query with the GPU engine, we can currently only materialise the end result in CPU memory. If we want to end up with the final result "at rest" as parquet (or other on-disk format), we incur the cost of moving to CPU memory and then writing from there. libcudf has GPU-accelerated IO writers as well as readers, so it would be nice to be able to sink straight from GPU memory.

I think this is doable in the same way we currently hook into collect, by adding an engine argument.

@wence- wence- added enhancement New feature or an improvement of an existing feature A-gpu Area: gpu engine labels Dec 11, 2024
@wence- wence- self-assigned this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-gpu Area: gpu engine enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant