Open
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There are some times where enabling row filter actually slows down the parquet decoding and one reason for this is having to decompress (e.g. ZSTD) the pages twice
Describe the solution you'd like
#6921 has a cache to avoid this second decode
I would like to get that PR merged
Steps:
- Skip page should also support skip dict page #7409
- Add benchmark for parquet reader with row_filter and project settings #7401
- arrow_reader_row_filter benchmark doesn't capture page cache improvements #7460
- Support sync read for the page cache decode improvement. #7415
- Polish + merge
Describe alternatives you've considered
Additional context
I am filing this as a separate ticket as there are a lot of other ideas on #6921 that make it kind of hard to follow