I'm trying to make Hive 2.1.1 on Spark 2.1.0 work on a single instance. I'm not sure that's the right approach. Currently I only have one instance so I can't build a cluster.
When I run any insert query in hive, I get the error:
hive> insert into mcus (id, name) values (1, 'ARM');
Query ID = server_20170223121333_416506b4-13ba-45a4-a0a2-8417b187e8cc
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
I'm afraid that I didn't configure correctly since I couldn't find any Spark logs under hdfs dfs -ls /spark/eventlog
. Here's part of my hive-site.xml which is related to Spark and Yarn:
<property>
<name>hive.exec.stagingdir</name>
<value>/tmp/hive-staging</value>
</property>
<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.master</name>
<value>spark://ThinkPad-W550s-Lab:7077</value>
</property>
<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>
<property>
<name>spark.eventLog.dir</name>
<value>hdfs://localhost:8020/spark/eventlog</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>2g</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.home</name>
<value>/home/server/spark</value>
</property>
<property>
<name>spark.yarn.jar</name>
<value>hdfs://localhost:8020/spark-jars/*</value>
</property>
1) Since I didn't configure the fs.default.name
value in hadoop, could I just use hdfs://localhost:8020
as the file system path in the config file or change the port to 9000 (I get the same error when I change 8020 to 9000)?
2) I start spark by start-master.sh
and start-slave.sh spark://ThinkPad-W550s-Lab:7077
, is it correct?
3) According to this thread, how could I check the value of Spark Executor Memory + Overhead
in order to set the values of yarn.scheduler.maximum-allocation-mb
and yarn.nodemanager.resource.memory-mb
?
The values of yarn.scheduler.maximum-allocation-mb
and yarn.nodemanager.resource.memory-mb
are much greater than spark.executor.memory
.
4) How could I fix the Failed to create spark client
error?
Thanks a lot!
In my case, setting the spark.yarn.appMasterEnv.JAVA_HOME property was a problem.
fix...
For your 3rd question, you can find the values of yarn.scheduler.maximum-allocation-mb and yarn.nodemanager.resource.memory-mb properites in yarn-default.xml file. Alternatively, If you have access to yarn resource manager you can find the values under Tools-> Configuration(xml)