With a fresh install of Spark 2.1, I am getting an error when executing the pyspark command.
Traceback (most recent call last):
File "/usr/local/spark/python/pyspark/shell.py", line 43, in <module>
spark = SparkSession.builder\
File "/usr/local/spark/python/pyspark/sql/session.py", line 179, in getOrCreate
session._jsparkSession.sessionState().conf().setConfString(key, value)
File "/usr/local/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/local/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
I have Hadoop and Hive on the same machine. Hive is configured to use MySQL for the metastore. I did not get this error with Spark 2.0.2.
Can someone please point me in the right direction?
I too was struggling in cluster mode. Added hive-site.xml from sparkconf directory, if you have hdp cluster then it should be at /usr/hdp/current/spark2-client/conf. Its working for me.
The issue for me was solved by disabling HADOOP_CONF_DIR environment variable. It was pointing to hadoop configuration directory and while starting
pyspark
shell, the variable caused spark to initiate hadoop cluster which wasn't initiated.So if you have HADOOP_CONF_DIR variable enabled, then you have to start hadoop cluster started before using spark shells
Or you need to disable the variable.
Spark 2.1.0 - When I run it with yarn client option - I don't see this issue, but yarn cluster mode gives "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':".
Still looking for answer.
You are missing the spark-hive jar.
For example, if you are running on Scala 2.11, with Spark 2.1, you can use this jar.
https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.11/2.1.0
I was getting this error trying to run pyspark and spark-shell when my HDFS wasn't started.
I was getting same error in windows environment and Below trick worked for me.
in
shell.py
the spark session is defined with.enableHiveSupport()
Remove hive support and redefine spark session as below:
you can find
shell.py
in your spark installation folder. for me it's in"C:\spark-2.1.1-bin-hadoop2.7\python\pyspark"
Hope this helps