You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HiveTableLastUpdatedExtractor and PrestoViewMetadataExtractor now are only workable with MySQL connection string.
It will show syntax error when using Postgres.
The same kind of compatibility is already implemented in HiveTableMetadataExtractor. It should also be applied to other hive-related extractors.
Expected Behavior
There should be no problem using PostgreSQL. It should work same as MySQL.
Current Behavior
Traceback (most recent call last):
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.SyntaxError: syntax error at or near "`"
LINE 2: SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t....
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/trino.py", line 228, in <module>
trino_view_loading_job.launch()
File "/home/venv/lib/python3.9/site-packages/databuilder/job/job.py", line 76, in launch
raise e
File "/home/venv/lib/python3.9/site-packages/databuilder/job/job.py", line 64, in launch
self._init()
File "/home/venv/lib/python3.9/site-packages/databuilder/job/job.py", line 51, in _init
self.task.init(self.conf)
File "/home/venv/lib/python3.9/site-packages/databuilder/task/task.py", line 45, in init
self.extractor.init(Scoped.get_scoped_conf(conf, self.extractor.get_scope()))
File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/presto_view_metadata_extractor.py", line 57, in init
self._alchemy_extractor = sql_alchemy_extractor.from_surrounding_config(conf, self.sql_stmt)
File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 102, in from_surrounding_config
ae.init(c)
File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 40, in init
self._execute_query()
File "/home/venv/lib/python3.9/site-packages/databuilder/extractor/sql_alchemy_extractor.py", line 65, in _execute_query
self.results = self.connection.execute(self.extract_sql)
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1003, in execute
return self._execute_text(object_, multiparams, params)
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1172, in _execute_text
ret = self._execute_context(
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/home/venv/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 608, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "`"
LINE 2: SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t....
^
[SQL:
SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t.TBL_TYPE, t.VIEW_ORIGINAL_TEXT as view_original_text
FROM TBLS t
JOIN DBS d ON t.DB_ID = d.DB_ID
WHERE t.VIEW_EXPANDED_TEXT = '/* Presto View */'
ORDER BY t.TBL_ID desc;
]
(Background on this error at: http://sqlalche.me/e/13/f405)
Possible Solution
I have already written equivalent PostgreSQL statement for those syntax errors. Maybe I could make it both compatible following the approach in MetadataExtractor and submit a PR later.
HiveTableLastUpdatedExtractor
The commented out part is the original MySQL
# PARTITION_TABLE_SQL_STATEMENT = """# SELECT# DBS.NAME as `schema`,# TBL_NAME as table_name,# MAX(PARTITIONS.CREATE_TIME) as last_updated_time# FROM TBLS# JOIN DBS ON TBLS.DB_ID = DBS.DB_ID# JOIN PARTITIONS ON TBLS.TBL_ID = PARTITIONS.TBL_ID# {where_clause_suffix}# GROUP BY `schema`, table_name# ORDER BY `schema`, table_name;# """# PostgresqlPARTITION_TABLE_SQL_STATEMENT="""SELECTd."NAME" as "schema",t."TBL_NAME" as table_name,MAX(p."CREATE_TIME") as last_updated_timeFROM "TBLS" tJOIN "DBS" d ON t."DB_ID" = d."DB_ID"JOIN "PARTITIONS" p ON t."TBL_ID" = p."TBL_ID"{where_clause_suffix}GROUP BY "schema", table_nameORDER BY "schema", table_name;"""# NON_PARTITIONED_TABLE_SQL_STATEMENT = """# SELECT# DBS.NAME as `schema`,# TBL_NAME as table_name,# SDS.LOCATION as location# FROM TBLS# JOIN DBS ON TBLS.DB_ID = DBS.DB_ID# JOIN SDS ON TBLS.SD_ID = SDS.SD_ID# {where_clause_suffix}# ORDER BY `schema`, table_name;# """# PostgresqlNON_PARTITIONED_TABLE_SQL_STATEMENT="""SELECTd."NAME" as "schema",t."TBL_NAME" as table_name,s."LOCATION" as locationFROM "TBLS" tJOIN "DBS" d ON t."DB_ID" = d."DB_ID"JOIN "SDS" s ON t."SD_ID" = s."SD_ID"{where_clause_suffix}ORDER BY "schema", table_name;"""
PrestoView
# SQL_STATEMENT = """# SELECT t.TBL_ID, d.NAME as `schema`, t.TBL_NAME name, t.TBL_TYPE, t.VIEW_ORIGINAL_TEXT as view_original_text# FROM TBLS t# JOIN DBS d ON t.DB_ID = d.DB_ID# WHERE t.VIEW_EXPANDED_TEXT = '/* Presto View */'# {where_clause_suffix}# ORDER BY t.TBL_ID desc;# """# PostgreSQLSQL_STATEMENT="""SELECT t."TBL_ID", d."NAME" as "schema", t."TBL_NAME" as name, t."TBL_TYPE", t."VIEW_ORIGINAL_TEXT" as view_original_textFROM "TBLS" tJOIN "DBS" d ON t."DB_ID" = d."DB_ID"WHERE t."VIEW_EXPANDED_TEXT" = '/* Presto View */'{where_clause_suffix}ORDER BY t."TBL_ID" desc;"""
Steps to Reproduce
Provide a PostgreSQL connection string to the extractor, it will fail and show the syntax error.
job_config=ConfigFactory.from_dict({
f'extractor.presto_view_metadata.extractor.sqlalchemy.{PrestoViewMetadataExtractor.WHERE_CLAUSE_SUFFIX_KEY}': where_clause_suffix,
f'extractor.presto_view_metadata.extractor.sqlalchemy.{SQLAlchemyExtractor.CONN_STRING}': metastore_postgresql_connection_string(),
f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.NODE_DIR_PATH}': node_files_folder,
f'loader.filesystem_csv_neo4j.{FsNeo4jCSVLoader.RELATION_DIR_PATH}': relationship_files_folder,
f'publisher.neo4j.{neo4j_csv_publisher.NODE_FILES_DIR}': node_files_folder,
f'publisher.neo4j.{neo4j_csv_publisher.RELATION_FILES_DIR}': relationship_files_folder,
f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_END_POINT_KEY}': neo4j_endpoint,
f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_USER}': neo4j_user,
f'publisher.neo4j.{neo4j_csv_publisher.NEO4J_PASSWORD}': neo4j_password,
f'publisher.neo4j.{neo4j_csv_publisher.JOB_PUBLISH_TAG}': timestamp, # give the timestamp to each node in neo4j
})
job=DefaultJob(conf=job_config,
task=DefaultTask(extractor=PrestoViewMetadataExtractor(), loader=FsNeo4jCSVLoader()),
publisher=Neo4jCsvPublisher())
The text was updated successfully, but these errors were encountered:
chonyy
changed the title
Hive Metastore PostgreSQL compatibility in HiveTableLastUpdatedExtractor and PrestoViewMetadataExtractor
Hive Metastore PostgreSQL compatibility issue on extractors
Dec 24, 2021
HiveTableLastUpdatedExtractor and PrestoViewMetadataExtractor now are only workable with MySQL connection string.
It will show syntax error when using Postgres.
The same kind of compatibility is already implemented in HiveTableMetadataExtractor. It should also be applied to other hive-related extractors.
Expected Behavior
There should be no problem using PostgreSQL. It should work same as MySQL.
Current Behavior
Possible Solution
I have already written equivalent PostgreSQL statement for those syntax errors. Maybe I could make it both compatible following the approach in MetadataExtractor and submit a PR later.
HiveTableLastUpdatedExtractor
The commented out part is the original MySQL
PrestoView
Steps to Reproduce
Provide a PostgreSQL connection string to the extractor, it will fail and show the syntax error.
The text was updated successfully, but these errors were encountered: