Here is the problem: If I declare a temporary function
in hive like this:
add jar /home/taobao/oplog/hivescript/my_udf.jar;
create temporary function getContentValue as 'com.my.udf.GetContentValue';
It'll works fine with function getContentValue
in this hive session. But what I want is not having to add jar as well as create temporary function every time I start a hive session. That is to say, make the function permanent.
Is there any solutions to this problem?
As of 0.13.0 (HIVE-6047) you can make UDFs permanent through:
CREATE FUNCTION [db_name.]function_name AS class_name [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]
An example for your UDF:
CREATE FUNCTION getContentValue AS 'com.my.udf.GetContentValue' USING JAR '/home/taobao/oplog/hivescript/my_udf.jar';
For more info check Hive Data Definition Language, quoting it:
This statement lets you create a function that is implemented by the class_name. Jars, files, or archives which need to be added to the environment can be specified with the USING clause; when the function is referenced for the first time by a Hive session, these resources will be added to the environment as if ADD JAR/FILE had been issued. If Hive is not in local mode, then the resource location must be a non-local URI such as an HDFS location.
The function will be added to the database specified, or to the current database at the time that the function was created. The function can be referenced by fully qualifying the function name (db_name.funciton_name), or can be referenced without qualification if the function is in the current database.
If you only use hive from the hive-cli tool, then adding those two lines in .hiverc
should do:
The CLI when invoked without the -i option will attempt to load $HIVE_HOME/bin/.hiverc and $HOME/.hiverc as initialization files.
If you use JDBC/ODBC to connect to hive, then you must do it in app code, eg. have a connection factory that first issues the two commands in the connection before giving out the connection for app to use.
Making a temporary UDF permanent (ie. store it in metastore, along with the JAR) is not possible at the moment. It is technically possible, but such a feature is just not there.
Write a udf and make its jar.
Having a separate user for such udfs is good.
Upload the jar to this user's hdfs location
hdfs dfs -put <hive-udf>.jar /user/udf/
hive> use udf;
hive> CREATE FUNCTION RemoveNewLine AS 'com.mycompany.hive.udf.RemoveNewLine' USING JAR 'hdfs://<defaultFS>/user/udf/<hive-udf>.jar';
Uses:
hive> select date, udf.RemoveNewLine(url) from abc.testTable limit 1;
1) Store the JAR file in hdfs location instead of local. If we have already created local function we might have to use a new function name
2) hive> CREATE FUNCTION MASK AS 'hiveudf.PI' using JAR 'hdfs://localhost:8020/user/cloudera/Mask.jar'; in the hive shell
Note: When the function is created, it moves the JAR file from HDFS to local system
For more information you can visit http://www.prathapkudupublog.com/2018/01/permanent-user-defined-function-udf-in.html
This worked for me:
Add jar hdfs:/user/UID/UDF_Library/defValUDF.jar;
CREATE TEMPORARY FUNCTION defVal as 'defValUDF.DefValUDF';
select defval(client_id,"UNK"),
client_id
from fin.ic_table limit 20;
You can store the jar in hdfs, modify the permissions, and then any user can use this UDF. The "add jar" and "create temproary function" commands execute in under a second.
Also found this: Adding hive jars permanently