So, I have hdfs and hive working together. I also have the jdbc driver for Hive functioning so that I can make remote jdbc calls.
Now, I have added a Hive User Defined Function (UDF). It works great in the CLI... I even load the jar and associated function automatically via the .hiverc file. However, I cannot get this to work using the hive jdbc driver. I thought it would also use the .hiverc file (by default, located in /usr/lib/hive/bin/), but it does not seem to work. I also tried adding it via an 'add jar' SQL command as the first thing, but no matter where I put the jar file, I get an error in hive.log that the file cannot be found.
Anyone know how to do this? I am using the Cloudera Distribution (CDH3u2), which uses Hive-0.7.1.
Thanks, in advance.
I think that the JDBC driver uses Thrift, which would mean that the JAR probably needs to be on the Thrift server (the hive server that you connect to in your conn string), and in the hive classpath there.
According the Hive developer mailing list, in the current Hive version (0.9) there's no solution for this issue. To workarround this I used a connection factory class that properly register the jars and functions everytime a connection session is started. The code bellow works wonderfully:
I use JDBC driver to connect to Hive as well. I scp my jar onto the master node of the cluster, which is also where Hive is installed and then use the absolute path to the file (on the master node) in my add jar command. I issue the add jar command via the JDBC driver just like any other HQL command.