You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client
### What changes were proposed in this pull request?
Instantiate a new Hive client through `Hive.getWithoutRegisterFns(conf, false)` instead of `Hive.get(conf)`, if `Hive` version is >= '2.3.9' (the built-in version).
### Why are the changes needed?
[HIVE-10319](https://issues.apache.org/jira/browse/HIVE-10319) introduced a new API `get_all_functions` which is only supported in Hive 1.3.0/2.0.0 and up. As result, when Spark 3.x talks to a HMS service of version 1.2 or lower, the following error will occur:
```
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
... 96 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833)
```
The `get_all_functions` is called only when `doRegisterAllFns` is set to true:
```java
private Hive(HiveConf c, boolean doRegisterAllFns) throws HiveException {
conf = c;
if (doRegisterAllFns) {
registerAllFunctionsOnce();
}
}
```
what this does is to register all Hive permanent functions defined in HMS in Hive's `FunctionRegistry` class, via iterating through results from `get_all_functions`. To Spark, this seems unnecessary as it loads Hive permanent (not built-in) UDF via directly calling the HMS API, i.e., `get_function`. The `FunctionRegistry` is only used in loading Hive's built-in function that is not supported by Spark. At this time, it only applies to `histogram_numeric`.
[HIVE-21563](https://issues.apache.org/jira/browse/HIVE-21563) introduced a new API `getWithoutRegisterFns` which skips the above registration and is available in Hive 2.3.9. Therefore, Spark should adopt it to avoid the cost.
### Does this PR introduce _any_ user-facing change?
Yes with this fix Spark now should be able to talk to HMS server with Hive 1.2.x and lower.
### How was this patch tested?
Manually started a HMS server of Hive version 1.2.2. Without the PR it failed with the above exception. With the PR the error disappeared and I can successfully perform common operations such as create table, create database, list tables, etc.
Closesapache#32887 from sunchao/SPARK-35321-new.
Authored-by: Chao Sun <[email protected]>
Signed-off-by: Yuming Wang <[email protected]>
0 commit comments