-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: Add lazy sinks #21733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add lazy sinks #21733
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #21733 +/- ##
==========================================
- Coverage 81.03% 81.02% -0.01%
==========================================
Files 1610 1610
Lines 233031 233003 -28
Branches 2685 2689 +4
==========================================
- Hits 188837 188802 -35
- Misses 43563 43570 +7
Partials 631 631 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@coastalwhite why return the emtpy dataframe? Could that lead to misunderstandings with a query like: pl.scan_csv("somefile").sink_csv("somefile2", lazy=True).collect().with_columns(pl.all() * 2).write_csv("somefile3") Would it make more sense for |
We could maybe do that. I would require some magic on the python side. |
@coastalwhite the file-path was just a suggestion of something that could be meaningful to return. I think mostly I would be worried about the |
After thinking about it a bit more, for now, I don't think it is a good idea. If you explicitly say lazy, I am assuming you are going to do something with |
Fixed #6506 |
fae3ee5
to
ee7d3f6
Compare
Almost the CSE part is incoming. |
This PR adds a
lazy
boolean flag to all sinks. If this is set to true, the sink returns aLazyFrame
and.collect()
needs to be called before it gets executed. The collect returns an emptyDataFrame
. This also now allows combination ofsink_*
andcollect_all
.Example
Fixes #6506.