Skip to content

Hive Metastore PostgreSQL compatibility issue on extractors #1608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chonyy opened this issue Dec 5, 2021 · 3 comments
Closed

Hive Metastore PostgreSQL compatibility issue on extractors #1608

chonyy opened this issue Dec 5, 2021 · 3 comments

Comments

@chonyy
Copy link
Contributor

chonyy commented Dec 5, 2021

HiveTableLastUpdatedExtractor and PrestoViewMetadataExtractor now are only workable with MySQL connection string.
It will show syntax error when using Postgres.
The same kind of compatibility is already implemented in HiveTableMetadataExtractor. It should also be applied to other hive-related extractors.

Expected Behavior

There should be no problem using PostgreSQL. It should work same as MySQL.

Current Behavior

Traceback (most recent call last):
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.SyntaxError: syntax error at or near "`"
LINE 2:     SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t....
                                       ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/trino.py", line 228, in <module>
    trino_view_loading_job.launch()
  File "/home/venv/lib/python3.9/site-packages/databuilder/job/job.py", line 76, in launch
    raise e
  File "/home/venv/lib/python3.9/site-packages/databuilder/job/job.py", line 64, in launch
    self._init()
  File "/home/venv/lib/python3.9/site-packages/databuilder/job/job.py", line 51, in _init
    self.task.init(self.conf)
  File "/home/venv/lib/python3.9/site-packages/databuilder/task/task.py", line 45, in init
    self.extractor.init(Scoped.get_scoped_conf(conf, self.extractor.get_scope()))
  File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/presto_view_metadata_extractor.py", line 57, in init
    self._alchemy_extractor = sql_alchemy_extractor.from_surrounding_config(conf, self.sql_stmt)
  File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 102, in from_surrounding_config
    ae.init(c)
  File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 40, in init
    self._execute_query()
  File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 65, in _execute_query
    self.results = self.connection.execute(self.extract_sql)
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1003, in execute
    return self._execute_text(object_, multiparams, params)
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1172, in _execute_text
    ret = self._execute_context(
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "`"
LINE 2:     SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t....
                                       ^

[SQL:
    SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t.TBL_TYPE, t.VIEW_ORIGINAL_TEXT as view_original_text
    FROM TBLS t
    JOIN DBS d ON t.DB_ID = d.DB_ID
    WHERE t.VIEW_EXPANDED_TEXT = '/* Presto View */'

    ORDER BY t.TBL_ID desc;
    ]
(Background on this error at: http://sqlalche.me/e/13/f405)

Possible Solution

I have already written equivalent PostgreSQL statement for those syntax errors. Maybe I could make it both compatible following the approach in MetadataExtractor and submit a PR later.

HiveTableLastUpdatedExtractor

The commented out part is the original MySQL

# PARTITION_TABLE_SQL_STATEMENT = """
# SELECT
# DBS.NAME as `schema`,
# TBL_NAME as table_name,
# MAX(PARTITIONS.CREATE_TIME) as last_updated_time
# FROM TBLS
# JOIN DBS ON TBLS.DB_ID = DBS.DB_ID
# JOIN PARTITIONS ON TBLS.TBL_ID = PARTITIONS.TBL_ID
# {where_clause_suffix}
# GROUP BY `schema`, table_name
# ORDER BY `schema`, table_name;
# """

# Postgresql
PARTITION_TABLE_SQL_STATEMENT = """
SELECT
d."NAME" as "schema",
t."TBL_NAME" as table_name,
MAX(p."CREATE_TIME") as last_updated_time
FROM "TBLS" t
JOIN "DBS" d ON t."DB_ID" = d."DB_ID"
JOIN "PARTITIONS" p ON t."TBL_ID" = p."TBL_ID"
{where_clause_suffix}
GROUP BY "schema", table_name
ORDER BY "schema", table_name;
"""

# NON_PARTITIONED_TABLE_SQL_STATEMENT = """
# SELECT
# DBS.NAME as `schema`,
# TBL_NAME as table_name,
# SDS.LOCATION as location
# FROM TBLS
# JOIN DBS ON TBLS.DB_ID = DBS.DB_ID
# JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
# {where_clause_suffix}
# ORDER BY `schema`, table_name;
# """

# Postgresql
NON_PARTITIONED_TABLE_SQL_STATEMENT = """
SELECT
d."NAME" as "schema",
t."TBL_NAME" as table_name,
s."LOCATION" as location
FROM "TBLS" t
JOIN "DBS" d ON t."DB_ID" = d."DB_ID"
JOIN "SDS" s ON t."SD_ID" = s."SD_ID"
{where_clause_suffix}
ORDER BY "schema", table_name;
"""

PrestoView

# SQL_STATEMENT = """
# SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t.TBL_TYPE, t.VIEW_ORIGINAL_TEXT as view_original_text
# FROM TBLS t
# JOIN DBS d ON t.DB_ID = d.DB_ID
# WHERE t.VIEW_EXPANDED_TEXT = '/* Presto View */'
# {where_clause_suffix}
# ORDER BY t.TBL_ID desc;
# """

# PostgreSQL
SQL_STATEMENT = """
SELECT t."TBL_ID", 
d."NAME" as "schema", 
t."TBL_NAME" as name, 
t."TBL_TYPE", 
t."VIEW_ORIGINAL_TEXT" as view_original_text
FROM "TBLS" t
JOIN "DBS" d ON t."DB_ID" = d."DB_ID"
WHERE t."VIEW_EXPANDED_TEXT" = '/* Presto View */'
{where_clause_suffix}
ORDER BY t."TBL_ID" desc;
"""

Steps to Reproduce

Provide a PostgreSQL connection string to the extractor, it will fail and show the syntax error.

job_config = ConfigFactory.from_dict({
        f'extractor.presto_view_metadata.extractor.sqlalchemy.{PrestoViewMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY}': where_clause_suffix,
        f'extractor.presto_view_metadata.extractor.sqlalchemy.{SQLAlchemyExtractor.CONN_STRING}': metastore_postgresql_connection_string(),
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.NODE_DIR_PATH}': node_files_folder,
        f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.RELATION_DIR_PATH}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NODE_FILES_DIR}': node_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.RELATION_FILES_DIR}': relationship_files_folder,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_END_POINT_KEY}': neo4j_endpoint,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_USER}': neo4j_user,
        f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_PASSWORD}': neo4j_password,
        f'publisher.neo4j.{neo4j_csv_publisher.JOB_PUBLISH_TAG}': timestamp,  # give the timestamp to each node in neo4j
    })
    job = DefaultJob(conf=job_config,
                     task=DefaultTask(extractor=PrestoViewMetadataExtractor(), loader=FsNeo4jCSVLoader()),
                     publisher=Neo4jCsvPublisher())
@stale
Copy link

stale bot commented Dec 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Dec 20, 2021
@chonyy
Copy link
Contributor Author

chonyy commented Dec 21, 2021

Bump, PR for the fix is on the way!

@chonyy chonyy changed the title Hive Metastore PostgreSQL compatibility in HiveTableLastUpdatedExtractor and PrestoViewMetadataExtractor Hive Metastore PostgreSQL compatibility issue on extractors Dec 24, 2021
@stale
Copy link

stale bot commented Jan 14, 2022

This issue has been automatically closed for inactivity. If you still wish to make these changes, please open a new pull request or reopen this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant