How to include jars in Hive (Amazon Hadoop env)

2019-07-22 15:00发布

问题:

I need to include newer protobuf jar (newer than 2.5.0) in Hive. Somehow no matter where I put the jar - it's being pushed to the end of the classpath. How can I make sure that the jar is in the beginning of the classpath of Hive?

回答1:

To add your own jar to the Hive classpath so that it's included in the beginning of the classpath and not overloaded by some hadoop jar you need to set the following Env variable -

export HADOOP_USER_CLASSPATH_FIRST=true

This indicates that the HADOOP_CLASSPATH will gain priority over general hadoop jars.

At Amazon emr instances you can add this to /home/hadoop/conf/hadoop-env.sh, and modify the classpath in this file also.

This is useful when you want to overload jars like protobuf that come with the hadoop general classpath.



回答2:

The other thing you might consider doing is including the protobuf classes in your jar. You would need to build your jar with the assembly plugin, which will those classes. Its an option.