Hive 2.1.1 on Spark - Which version of Spark shoul

2019-07-10 14:13发布

问题:

I'm running hive 2.1.1, hadoop 2.7.3 on Ubuntu 16.04.

According to Hive on Spark: Getting Started , it says

Install/build a compatible version. Hive root pom.xml's defines what version of Spark it was built/tested with.

I checked the pom.xml, it shows that spark version is 1.6.0.

<spark.version>1.6.0</spark.version>

But Hive on Spark: Getting Started also says that

Prior to Spark 2.0.0: ./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.4,parquet-provided"

Since Spark 2.0.0: ./dev/make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided"

So now I'm confused because I am running hadoop 2.7.3. Do I have to downgrade my hadoop to 2.4?

Which version of Spark should I use? 1.6.0 or 2.0.0?

Thank you!

回答1:

I am currently using spark 2.0.2 with hadoop 2.7.3 and hive 2.1 and it's working fine. And I think hive will support both version of spark 1.6.x and 2.x but I will suggest you to go with spark 2.x since it's the latest version.

Some motivational link for why to use spark 2.x https://docs.cloud.databricks.com/docs/latest/sample_applications/04%20Apache%20Spark%202.0%20Examples/03%20Performance%20Apache%20(Spark%202.0%20vs%201.6).html

Apache Spark vs Apache Spark 2



回答2:

The current version of Spark 2.X is not compatible with Hive 2.1 and Hadoop 2.7, there is a major bug:

JavaSparkListener is not available and Hive crash on execution

https://issues.apache.org/jira/browse/SPARK-17563

You can try to build Hive 2.1 with Hadoop 2.7 and Spark 1.6 with:

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.7,parquet-provided" 

If you take a look to the command after 2.0 the difference is that ./make-distribution is inside the folder /dev.

If it does not work for hadoop 2.7.X, I can confirm you that I have been able to successfully built it with Hadoop 2.6, by using:

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided" 

and for scala 2.10.5