Comparing read times between dense and sparse parquet files #4258
mikeprince4
started this conversation in
General
Replies: 1 comment 7 replies
-
I would expect reading data from a sparse table while filtering out NULL values to be much faster due to predicate pushdown. Hopefully, the Daft team can provide some suggestions. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wrote a test to compare the time to read a dense vs sparse parquet file. I was expecting the times to be very similar, but was surprised when reading the sparse columns took much longer (despite filtering out the nulls). This is not what I was expecting, as I had been told that daft would be able to perform this type of operation efficiently. I'm curious if this behavior is in fact expected, or perhaps I am doing something wrong?
The code and results table are below
Thanks in advance
Here are the results of the experiment with
iterations=100
Beta Was this translation helpful? Give feedback.
All reactions