Unable to use an existing Hive permanent UDF from

2019-03-12 01:13发布

I have previously registered a UDF with hive. It is permanent not TEMPORARY. It works in beeline.

CREATE FUNCTION normaliseURL AS 'com.example.hive.udfs.NormaliseURL' USING JAR 'hdfs://udfs/hive-udfs.jar';

I have spark configured to use the hive metastore. The config is working as I can query hive tables. I can see the UDF;

In [9]: spark.sql('describe function normaliseURL').show(truncate=False)
+-------------------------------------------+
|function_desc                              |
+-------------------------------------------+
|Function: default.normaliseURL             |
|Class: com.example.hive.udfs.NormaliseURL  |
|Usage: N/A.                                |
+-------------------------------------------+

However I cannot use the UDF in a sql statement;

spark.sql('SELECT normaliseURL("value")')
AnalysisException: "Undefined function: 'default.normaliseURL'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7"

If I attempt to register the UDF with spark (bypassing the metastore) it fails to register it, suggesting that it does already exist.

In [12]: spark.sql("create function normaliseURL as 'com.example.hive.udfs.NormaliseURL'")
AnalysisException: "Function 'default.normaliseURL' already exists in database 'default';"

I'm using Spark 2.0, hive metastore 1.1.0. The UDF is scala, my spark driver code is python.

I'm stumped.

  • Am I correct in my assumption that Spark can utilise metastore-defined permanent UDFs?
  • Am I creating the function correctly in hive?

3条回答
兄弟一词,经得起流年.
2楼-- · 2019-03-12 01:45

A Function can not called directly in select (like sql server) .

You have to create some dumy table like oracle.

CREATE TABLE dual (dummy STRING);

load data local in path '/path/to/textfile/dual.txt' overwrite into table dual;

SELECT normaliseURL('value') from dual;

or

SELECT * from normaliseURL('value')
查看更多
对你真心纯属浪费
3楼-- · 2019-03-12 01:49

It will work on spark on yarn environment however as suggested you need to use spark-shell --jars <path-to-your-hive-udf>.jar not in hdfs but in local.

查看更多
ら.Afraid
4楼-- · 2019-03-12 01:52

Issue is Spark 2.0 is not able to execute the functions whose JARs are located on HDFS.

Spark SQL: Thriftserver unable to run a registered Hive UDTF

One workaround is to define the function as a temporary function in Spark job with jar path pointing to a local edge-node path. Then call the function in same Spark job.

CREATE TEMPORARY FUNCTION functionName as 'com.test.HiveUDF' USING JAR '/user/home/dir1/functions.jar'
查看更多
登录 后发表回答