-
Notifications
You must be signed in to change notification settings - Fork 638
fix(trino,pyspark): improve null handling in array filter #10448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(trino,pyspark): improve null handling in array filter #10448
Conversation
Looks like there is also an issue in the trino implementation for the same reason as there was in pyspark.
|
Hey @stephen-bowser -- thanks for putting this together! I can see that you copied the values check from the previous test -- that would work if we weren't dealing with Pandas NULL/NaN nonsense, so you're getting a test failure because Pandas makes things You might be better served by using |
dff4ff7
to
4e851e5
Compare
Went ahead and fixed the trino backend here. |
4e851e5
to
e202d66
Compare
Fix on the way for using pyarrow to test |
e202d66
to
de92f72
Compare
de92f72
to
bcc57fb
Compare
bcc57fb
to
62ed89f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this in @stephen-bowser !
Description of changes
This fixes an issue with the pyspark array filter function. The original implementation does not account for handling nulls correctly in the input array.
I'm not too faimiliar with SqlGlot, but by copying the implementation from duckdb, I was able to get all the test cases passing. Happy to take feedback if there's something I've missed though.
See this issue for further details
Issues closed
Resolves #10201