I am having a problem to run an spark job via spark-submit due to the following error:
16/11/16 11:41:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)
java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)
...
I am using spark 1.6.0 with scala 2.10, hive 1.1.0 and the platform is a CDH 5.7.1 with the same versions for spark and hive.
The hive-exec that is passed on the classpath to the spark job is hive-exec-1.1.0-cdh5.7.1.jar. This jar has a class org.apache.hadoop.hive.ql.metadata.Hive
which I can see has the following method:
public java.util.Map<java.util.Map<java.lang.String, java.lang.String>, org.apache.hadoop.hive.ql.metadata.Partition> loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map<java.lang.String, java.lang.String>, boolean, int, boolean, boolean, boolean) throws org.apache.hadoop.hive.ql.metadata.HiveException;
Which is not the same that the one on the org.apache.spark.sql.hive.client.ClientWrapper
class shipped with the library spark-hive_2.10-1.6.0.jar that I am using, the signature of the same method in this class is using the class org.apache.spark.sql.hive.client.HiveShim
with this method:
private lazy val loadDynamicPartitionsMethod =
findMethod(
classOf[Hive],
"loadDynamicPartitions",
classOf[Path],
classOf[String],
classOf[JMap[String, String]],
JBoolean.TYPE,
JInteger.TYPE,
JBoolean.TYPE,
JBoolean.TYPE)
I also checked the history of the hive-exec jar and seems that the signature of the class org.apache.hadoop.hive.ql.metadata.Hive
was changed after the version 1.0.0.
I am new to Spark but it seems to me that the spark-hive library uses an old implementation of Hive (I can see in the META-INF/DEPENDENCIES file inside the jar has declared a dependency on org.spark-project.hive:hive-exec:jar:1.2.1.spark).
Does anyone knows how to set the spark job to use the proper hive library?
Make sure you have set the below setting
In Spark you can set on hive Context as below
If problem still exists i guess so means spark version what you are using doesn't match with the environment where you are trying to run your spark-submit...You can try to run your program in spark-shell and if it works then try to align spark version with the environment setting.
You can set the dependency on you sbt as below or pom
Please refer https://mvnrepository.com/artifact/org.apache.spark
You can get environment setting by using below command SPARK_PRINT_LAUNCH_COMMAND=true spark-shell
Alternative approach is to use spark partition by to save data