Spark 2.1 - Error While instantiating HiveSessionS

2020-02-01 02:26发布

With a fresh install of Spark 2.1, I am getting an error when executing the pyspark command.

Traceback (most recent call last):
File "/usr/local/spark/python/pyspark/shell.py", line 43, in <module>
spark = SparkSession.builder\
File "/usr/local/spark/python/pyspark/sql/session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/local/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"

I have Hadoop and Hive on the same machine. Hive is configured to use MySQL for the metastore. I did not get this error with Spark 2.0.2.

Can someone please point me in the right direction?

8条回答
贪生不怕死
2楼-- · 2020-02-01 02:33

I too was struggling in cluster mode. Added hive-site.xml from sparkconf directory, if you have hdp cluster then it should be at /usr/hdp/current/spark2-client/conf. Its working for me.

查看更多
做个烂人
3楼-- · 2020-02-01 02:38

The issue for me was solved by disabling HADOOP_CONF_DIR environment variable. It was pointing to hadoop configuration directory and while starting pyspark shell, the variable caused spark to initiate hadoop cluster which wasn't initiated.

So if you have HADOOP_CONF_DIR variable enabled, then you have to start hadoop cluster started before using spark shells

Or you need to disable the variable.

查看更多
我欲成王,谁敢阻挡
4楼-- · 2020-02-01 02:39

Spark 2.1.0 - When I run it with yarn client option - I don't see this issue, but yarn cluster mode gives "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':".

Still looking for answer.

查看更多
混吃等死
5楼-- · 2020-02-01 02:41

You are missing the spark-hive jar.

For example, if you are running on Scala 2.11, with Spark 2.1, you can use this jar.

https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.11/2.1.0

查看更多
何必那么认真
6楼-- · 2020-02-01 02:42

I was getting this error trying to run pyspark and spark-shell when my HDFS wasn't started.

查看更多
趁早两清
7楼-- · 2020-02-01 02:52

I was getting same error in windows environment and Below trick worked for me.

in shell.py the spark session is defined with .enableHiveSupport()

 spark = SparkSession.builder\
            .enableHiveSupport()\
            .getOrCreate()

Remove hive support and redefine spark session as below:

spark = SparkSession.builder\
        .getOrCreate()

you can find shell.py in your spark installation folder. for me it's in "C:\spark-2.1.1-bin-hadoop2.7\python\pyspark"

Hope this helps

查看更多
登录 后发表回答