Skip to content

bug (Impala): "date" type not recognized when trying to read Hive table #4449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
indylec opened this issue Aug 31, 2022 · 5 comments · Fixed by #4452
Closed

bug (Impala): "date" type not recognized when trying to read Hive table #4449

indylec opened this issue Aug 31, 2022 · 5 comments · Fixed by #4452
Assignees
Labels
bug Incorrect behavior inside of ibis impala The Apache Impala backend
Milestone

Comments

@indylec
Copy link

indylec commented Aug 31, 2022

A Hive table I'm trying to read has columns with the "DATE" type (valid type as per the Impala docs).

When trying to run a simple expression (table = connection.table('my_db.my_table'), I get the following traceback:

Input In [9], in <cell line: 1>()
----> 1 hive_conn.table('tableau_di_tables.ds_consumer_travel_orders')

File ~/miniconda3/envs/ds-hotel-ranking/lib/python3.10/site-packages/ibis/backends/base/sql/__init__.py:44, in BaseSQLBackend.table(self, name, database)
     29 """Construct a table expression.
     30 
     31 Parameters
   (...)
     41     Table expression
     42 """
     43 qualified_name = self._fully_qualified_name(name, database)
---> 44 schema = self.get_schema(qualified_name)
     45 node = self.table_class(qualified_name, schema, self)
     46 return self.table_expr_class(node)

File ~/miniconda3/envs/ds-hotel-ranking/lib/python3.10/site-packages/ibis/backends/impala/__init__.py:482, in Backend.get_schema(self, table_name, database)
    479 pairs = [row[:2] for row in self.con.fetchall(query)]
    481 names, types = zip(*pairs)
--> 482 ibis_types = [udf.parse_type(type.lower()) for type in types]
    483 return sch.Schema(names, ibis_types)

File ~/miniconda3/envs/ds-hotel-ranking/lib/python3.10/site-packages/ibis/backends/impala/__init__.py:482, in <listcomp>(.0)
    479 pairs = [row[:2] for row in self.con.fetchall(query)]
    481 names, types = zip(*pairs)
--> 482 ibis_types = [udf.parse_type(type.lower()) for type in types]
    483 return sch.Schema(names, ibis_types)

File ~/miniconda3/envs/ds-hotel-ranking/lib/python3.10/site-packages/ibis/backends/impala/udf.py:310, in parse_type(t)
    308         return ValueError(t)
    309 else:
--> 310     raise Exception(t)

Exception: date

Looking at the source code it seems that "date" type is not in the _impala_to_ibis_type dict in udf.py.

Is it possible to include the date type here? It seems like a one-to-one equivalency with the ibis type.

I can try and submit a pull request for this although I don't have much experience with contributing!

@gforsyth
Copy link
Member

Hey @indylec! Thanks for the report!

I think you've diagnosed the issue correctly (although I haven't used the impala backend much at all). Try out your fix locally and if that resolves the issue then we'd certainly welcome a pull request with the fix (and we're happy to help walk you through any rough patches in the contribution process).

@indylec
Copy link
Author

indylec commented Aug 31, 2022

Hi @gforsyth,

Ok, I'll try it out locally and attempt a pull request - stay tuned.

/take

@indylec
Copy link
Author

indylec commented Sep 1, 2022

Hi @gforsyth,

So the fix works locally, I also had to add a key-value pair to the _HS2_TTypeId_to_dtype dict in impala/__init__.py ('DATE':'datetime64[ns]').
The table I need to read also has a couple of undefined columns (show up as "VOID" type in dBeaver) - I have also added the appropriate key-value pairs for this ('void':'null' in udf.py and 'void':None in __init__.py)

Now to the (first) rough patch:

The core test suite passes ( 3116 passed, 6016 deselected, 4 xfailed in 18.77s), however the Impala subset fails because the tests are not able to connect to the expected test databases (1 failed, 280 passed, 8559 deselected, 26 xfailed, 805 errors in 122.58s).

Any guidance on how to set this up would be welcome, alternatively I can make the commit to my fork and someone else can run the Impala tests?

Thanks!

@cpcloud
Copy link
Member

cpcloud commented Sep 1, 2022

@indylec If you want to push up the PR, I can help you get through the test suite.

@cpcloud cpcloud added this to the 3.2.0 milestone Sep 1, 2022
@cpcloud cpcloud added bug Incorrect behavior inside of ibis impala The Apache Impala backend labels Sep 1, 2022
@indylec
Copy link
Author

indylec commented Sep 1, 2022

@cpcloud thanks - I had missed the part in the docs about using docker-compose but would appreciate your help to run it with Impala.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis impala The Apache Impala backend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants