Spark job fails due to java.lang.NoSuchMethodExcep

2019-09-07 01:28发布

问题:

I am having a problem to run an spark job via spark-submit due to the following error:

16/11/16 11:41:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)
java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean)
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:114)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod$lzycompute(HiveShim.scala:404)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitionsMethod(HiveShim.scala:403)
at org.apache.spark.sql.hive.client.Shim_v0_14.loadDynamicPartitions(HiveShim.scala:455)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadDynamicPartitions$1.apply(ClientWrapper.scala:562)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:281)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:228)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:227)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:270)
...

I am using spark 1.6.0 with scala 2.10, hive 1.1.0 and the platform is a CDH 5.7.1 with the same versions for spark and hive. The hive-exec that is passed on the classpath to the spark job is hive-exec-1.1.0-cdh5.7.1.jar. This jar has a class org.apache.hadoop.hive.ql.metadata.Hive which I can see has the following method:

public java.util.Map<java.util.Map<java.lang.String, java.lang.String>, org.apache.hadoop.hive.ql.metadata.Partition> loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map<java.lang.String, java.lang.String>, boolean, int, boolean, boolean, boolean) throws org.apache.hadoop.hive.ql.metadata.HiveException;

Which is not the same that the one on the org.apache.spark.sql.hive.client.ClientWrapper class shipped with the library spark-hive_2.10-1.6.0.jar that I am using, the signature of the same method in this class is using the class org.apache.spark.sql.hive.client.HiveShim with this method:

private lazy val loadDynamicPartitionsMethod =
findMethod(
  classOf[Hive],
  "loadDynamicPartitions",
  classOf[Path],
  classOf[String],
  classOf[JMap[String, String]],
  JBoolean.TYPE,
  JInteger.TYPE,
  JBoolean.TYPE,
  JBoolean.TYPE)

I also checked the history of the hive-exec jar and seems that the signature of the class org.apache.hadoop.hive.ql.metadata.Hive was changed after the version 1.0.0. I am new to Spark but it seems to me that the spark-hive library uses an old implementation of Hive (I can see in the META-INF/DEPENDENCIES file inside the jar has declared a dependency on org.spark-project.hive:hive-exec:jar:1.2.1.spark). Does anyone knows how to set the spark job to use the proper hive library?

回答1:

Make sure you have set the below setting

SET hive.exec.dynamic.partition=true; 
SET hive.exec.max.dynamic.partitions=2048
SET hive.exec.dynamic.partition.mode=nonstrict;

In Spark you can set on hive Context as below

hiveCtx.setConf("hive.exec.dynamic.partition","true")
hiveCtx.setConf("hive.exec.max.dynamic.partitions","2048")
hiveCtx.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

If problem still exists i guess so means spark version what you are using doesn't match with the environment where you are trying to run your spark-submit...You can try to run your program in spark-shell and if it works then try to align spark version with the environment setting.

You can set the dependency on you sbt as below or pom

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.3"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.3"
libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.6.3"
libraryDependencies += "org.apache.hive" % "hive-exec" % "1.1.0"

Please refer https://mvnrepository.com/artifact/org.apache.spark

You can get environment setting by using below command SPARK_PRINT_LAUNCH_COMMAND=true spark-shell

Alternative approach is to use spark partition by to save data

    dataframe.write.mode("overwrite").partitionBy("col1", "col2").json("//path")