Can't seem to build hive for spark

I have been trying to run this code in pyspark.

sqlContext = HiveContext(sc) 
datumDF = sqlContext.createDataFrame(datumX, schema)

But have been receiving this warning:

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o44))

I log in to AWS and spin up clusters with this code: /User/Downloads/spark-1.5.2-bin-hadoop2.6/ec2/spark-ec2 -k name -i /User/Desktop/pemfile.pem login clustername

However I all the docs I've found involve this commands, which exist in the file /users/downloads/spark-1.5.2/ I've run them anyway, and tried logging into was using the ec2 command in that folder after I did. Still, just got the same error

I submit export SPARK_HIVE=TRUE before running these commands on my local machine, but I've seen messages saying its deprecated and will be ignored anyway.

Build hive with maven:

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
    -Phive -Phive-thriftserver -DskipTests clean package

Build hive with sbt

 build/sbt -Pyarn -Phadoop-2.3 assembly

And another I found

./sbt/sbt -Phive assembly

I also took the hive-site.xml file and put in both the /Users/Downloads/spark-1.5.2-bin-hadoop2.6/conf folder and the /Users/Downloads/spark-1.5.2/conf

Still no luck.

I can't seem to run the hive commands no matter what I build it with or how I log in. Is there anything obvious I'm missing.

标签： amazon-ec2 apache-spark apache-spark-sql

1条回答

该账号已被封号

2楼-- · 2019-09-10 14:22

I too had the same error when using a HiveContext on a EC2 cluster built with the ec2 scripts that comes with the Spark package (v1.5.2 in my case). Through much trial and error, I found that building a EC2 cluster with the following options got the right version of Hadoop with Hive properly built so that I can use a HiveContext in my PySpark jobs:

spark-ec2 -k <your key pair name> -i /path/to/identity-file.pem -r us-west-2 -s 2 --instance-type m3.medium --spark-version 1.5.2 --hadoop-major-version yarn  launch <your cluster name>

The key parameters here is that you set --spark-version to 1.5.2 and --hadoop-major-version to yarn - even though you aren't using to use Yarn to submit jobs as it forces the hadoop build to be 2.4. Of course, adjust the other parameters as appropriate for your desired cluster.

0人赞添加讨论(0) 举报

Can't seem to build hive for spark

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间