-
Notifications
You must be signed in to change notification settings - Fork 3.7k
An Error Occured While Reading Parquet File Using C++ - GetRecordBatchReader -Corrupt snappy compressed data. #13186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems like a bug. Can you create a JIRA ticket? Can you attach a sample file that fails to read? |
Thanks,By the way,I just upgraded Arrow To 8.0.0 just now,this error occurs again |
@yurikoomiga Can you post a sample file that fails somewhere? (or code to reproduce the generation of the file) |
I'm sorry to reply you after so long.
I run it in ubuntu 9.4.0 and use python3.8, pyarrow 7.0.0 |
I have created a JIRA ticket here:https://issues.apache.org/jira/browse/ARROW-16642 |
FYI I am getting the same error using the CPP SDK. I am using apache arrow 8.0.0 and snappy 1.1.8. I will try to provide more details as I dive in. |
Hi All
When I use Arrow Reading Parquet File like follow:
status is not ok and an error occured like this:
IOError: Corrupt snappy compressed data.
When I comment out this statement
_reader->set_use_threads(true);
,The program runs normally and I can read parquet file well.Program errors only occur when I read multiple columns and using
_reader->set_use_threads(true);
and a single column will not occur errorThe testing parquet file is created by pyarrow,I use only 1 group and each group has 3000000 records.
The parquet file has 20 columns including int and string types
Reading file using C++,arrow 7.0.0 ,snappy 1.1.8
Writting file using python3.8 ,pyarrow 7.0.0
Looking forward to your reply
Thank you!
The text was updated successfully, but these errors were encountered: