Hadoop Hive - How can I 'add jar' for use

2019-05-11 05:54发布

问题:

So, I have hdfs and hive working together. I also have the jdbc driver for Hive functioning so that I can make remote jdbc calls.

Now, I have added a Hive User Defined Function (UDF). It works great in the CLI... I even load the jar and associated function automatically via the .hiverc file. However, I cannot get this to work using the hive jdbc driver. I thought it would also use the .hiverc file (by default, located in /usr/lib/hive/bin/), but it does not seem to work. I also tried adding it via an 'add jar' SQL command as the first thing, but no matter where I put the jar file, I get an error in hive.log that the file cannot be found.

Anyone know how to do this? I am using the Cloudera Distribution (CDH3u2), which uses Hive-0.7.1.

Thanks, in advance.

回答1:

I use JDBC driver to connect to Hive as well. I scp my jar onto the master node of the cluster, which is also where Hive is installed and then use the absolute path to the file (on the master node) in my add jar command. I issue the add jar command via the JDBC driver just like any other HQL command.



回答2:

According the Hive developer mailing list, in the current Hive version (0.9) there's no solution for this issue. To workarround this I used a connection factory class that properly register the jars and functions everytime a connection session is started. The code bellow works wonderfully:

    package com.rapidminer.operator.bigdata.runner.helpers;
import java.sql.*;

/** A Hive connection factory utility 
@author Marcelo Beckmann
*/
public class ConnectionFactory {

private static ConnectionFactory instance;

/** Basic attributes to make the connection*/
public String url = "jdbc:hive://localhost:10000/default";
public final String DRIVER = "org.apache.hadoop.hive.jdbc.HiveDriver";

public static ConnectionFactory getInstance(){
    if (instance==null)
        instance = new ConnectionFactory();
    return instance;
}
private ConnectionFactory()
{}
/**
 * Obtains a hive connection.
 * Warning! To use simultaneous connection from the Thrift server, you must change the
 * Hive metadata server from Derby to other database (MySQL for example).
 * @return
 * @throws Exception
 */
public Connection getConnection() throws Exception {

    Class.forName(DRIVER);

    Connection connection = DriverManager.getConnection(url,"","");

    runInitializationQueries(connection);
    return connection;
}

/**
 * Run initialization queries after the connection be obtained. This initialization was done in order
 * to workaround a known Hive bug (HIVE-657).
 * @throws SQLException
 */
private void runInitializationQueries(Connection connection) throws SQLException
{
    Statement stmt = null;
    try {
        //TODO Get the queries from a .hiverc file
        String[] args= new String[3];
        args[0]="add jar /home/hadoop-user/hive-0.9.0-bin/lib/hive-beckmann-functions.jar";  
        args[1]="create temporary function row_number as 'com.beckmann.hive.RowNumber'"; 
        args[2]="create temporary function sequence as 'com.beckmann.hive.Sequence'";
        for (String query:args)
        {
            stmt.execute(query);
        }
    }
    finally {
        if (stmt!=null)
            stmt.close();
    }

}
}


回答3:

I think that the JDBC driver uses Thrift, which would mean that the JAR probably needs to be on the Thrift server (the hive server that you connect to in your conn string), and in the hive classpath there.