Setting Spark as default execution engine for Hive

2019-04-02 00:32发布

Hadoop 2.7.3, Spark 2.1.0 and Hive 2.1.1.

I am trying to set spark as default execution engine for hive. I uploaded all jars in $SPARK_HOME/jars to hdfs folder and copied scala-library, spark-core, and spark-network-common jars to HIVE_HOME/lib. Then I configured hive-site.xml with the following properties:

  <property>
    <name>hive.execution.engine</name>
    <value>spark</value>
  </property>
  <property>
    <name>spark.master</name>
    <value>spark://master:7077</value>
    <description>Spark Master URL</description>
  </property>
  <property>
    <name>spark.eventLog.enabled</name>
    <value>true</value>
    <description>Spark Event Log</description>
  </property>
  <property>
    <name>spark.eventLog.dir</name>
    <value>hdfs://master:8020/user/spark/eventLogging</value>
    <description>Spark event log folder</description>
  </property>
  <property>
    <name>spark.executor.memory</name>
    <value>512m</value>
    <description>Spark executor memory</description>
  </property>
  <property>
    <name>spark.serializer</name>
    <value>org.apache.spark.serializer.KryoSerializer</value>
    <description>Spark serializer</description>
  </property>
  <property>
  <name>spark.yarn.jars</name>
  <value>hdfs://master:8020/user/spark/spark-jars/*</value>
</property>

In hive shell, I did the following:

hive> add jar ${env:HIVE_HOME}/lib/scala-library-2.11.8.jar;
Added [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/scala-library-2.11.8.jar]
hive> add jar ${env:HIVE_HOME}/lib/spark-core_2.11-2.1.0.jar;
Added [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-core_2.11-2.1.0.jar]
hive> add jar ${env:HIVE_HOME}/lib/spark-network-common_2.11-2.1.0.jar;
Added [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar] to class path
Added resources: [/usr/local/hive/hive-2.1.1/lib/spark-network-common_2.11-2.1.0.jar]
hive> set hive.execution.engine=spark;

When I tried to execute

hive> select count(*) from tableName;

I got the following:

Query ID = hduser_20170130230014_6e23dacc-78e8-4bd6-9fad-1344f6d0569e
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

Hive log shows that java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener

ERROR [main] client.SparkClientImpl: Error while waiting for client to connect.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 19 more

    at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
    at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:106)
    at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:99)
    at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:95)
    at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:69)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
    at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
    at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:136)
    at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:89)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Cancel client 'cc10915b-da97-4fd7-9960-49c03ea380d7'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
java.lang.NoClassDefFoundError: org/apache/spark/JavaSparkListener
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:229)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:695)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.JavaSparkListener
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 19 more

Please help me to integrate Hive 2.1.1 on Spark 2.1.0.

1条回答
再贱就再见
2楼-- · 2019-04-02 01:00

This is a bug in Spark, org.apache.spark.JavaSparkListener class has been removed from Spark 2.0.0. It has been fixed and in review process. If the fix will be approved then it will be available in the next Spark (could be Spark 2.2.0)

https://issues.apache.org/jira/browse/SPARK-17563

查看更多
登录 后发表回答