I'm trying to run a notebook on Analytics for Apache Spark running on Bluemix, but I hit the following error:
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and
run build/sbt assembly", Py4JJavaError(u'An error occurred while calling
None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38))
The error is intermittent - it doesn't always happen. The line of code in question is:
df = sqlContext.read.format('jdbc').options(
url=url,
driver='com.ibm.db2.jcc.DB2Driver',
dbtable='SAMPLE.ASSETDATA'
).load()
There are a few similar questions on stackoverflow, but they aren't asking about the spark service on bluemix.
Create a new
SQLContext
object before usingsqlContext
:and then run the code again.
This error happens if you have multiple notebooks using the out of box
sqlContext
.easily, you can solve this problem as the following : go to the top right corner, tap on you username, you will get a list: 1. choose interpreter 2. scroll the page until you get spark 3. in the right you have a list containing: spark ul, edit , ... choose restart
Go back to your notebook and run again, it must work
That statement initializes a HiveContext under the covers. The HiveContext then initializes a local Derby database to hold its metadata. The Derby database is created in the current directory by default. The reported problem occurs under these circumstances (among others):
Until IBM changes the default setup to avoid this problem, possible workarounds are:
For case 1, delete the leftover lockfiles. From a Python notebook, this is done by executing:
For case 2, change the current working directory before the Hive context is created. In a Python notebook, this will change into a newly created directory:
But beware, it will clutter the filesystem with a new directory and Derby database each time you run that notebook.
I happen to know that IBM is working on a fix. Please use these workarounds only if you encounter the problem, not proactively.
Unfortunately, the answer which tells "Create a new SQLContext" is totally wrong.
This is a bad idea to replace sqlContext with a new instance of SQLContext, because you're loosing hive support: by default, sqlContext is initialized with HiveContext.
Second, the message "You must build Spark with Hive. Export 'SPARK_HIVE=true'..." is a badly written code in PySpark (context.py) which doesn't get a correct exception from the java's spark driver and doesn't display it.
To find out what is going on, make so that java driver's log is written to some file. In my case, the customer has Spark with DSE and in the conf directory there are some .xml files named logback-spark*.xml. Open the file which is named logback-spark.xml (without suffixes) and add a file appender there. Then, reproduce the bug and read the exceptions + stack traces written by the Java driver.
For other Spark versions/builds find out first how to get the log for java driver, and set up configuration so that it logs to a file.
In my spark driver's log I had a clear message that Spark's login couldn't write to hive metastore's filesystem. In your case you may get a different message. But the problem in Java is the primary one - you should look at it first.