Open
Description
There is a lot of work to support plans like SELECT * .. ORDER BY x,y,z
when that data doesn't all fit in RAM.
I believe the core usecase is "compaction" where multiple small files are resorted/merged/written as larger fiiles
- feat: Add config
max_temp_directory_size
to limit max disk usage for spilling queries #14975 - Benchmark / program to test Spilling Sorts #15664
- limit max disk usage for spilling queries #15358
- A complete solution for stable and safe sort with spill #14692
- External sort failing with modest memory limit when writing parquet files #15028
- External sorting not working for (maybe only for string columns??) #12136
- Optimized spill file format #14078
- feat: Fix multi-lines printing issue for datafusion-cli and add the streaming printing feature back #14954
- Support
--memory-limit
for all benchmarking tools #14641 - Internal error in ExternalSorter when running with memory limit #15675
- Improve datafusion-cli memory usage and considering reserve memory for the result batches #14751
- Further improve datafusion-cli memory usage if we setting huge number for maxrow size. #14810
- Set DataFusion runtime configurations through SQL interface #15552
- Improve spill performance: Disable re-validation of spilled files #15320
- Improve Spill Performance:
mmap
the spill files #15321 - Reduce number of tokio blocking threads in SortExec spill #15323
- Support spilling in
TopK
queries #15538 - Improve time for SortPreservingMerge stream / uninitiated_partitions VecDeque<usize> #15573
FYI @2010YOUY01 and @Kontinuation I was trying to find some other organizing ticket for improvements in this area and I could not, so I filed this one
Related: