You must build Spark with Hive. Export 'SPARK_

2019-06-24 08:03发布

I'm trying to run a notebook on Analytics for Apache Spark running on Bluemix, but I hit the following error:

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and 
run build/sbt assembly", Py4JJavaError(u'An error occurred while calling 
None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o38))

The error is intermittent - it doesn't always happen. The line of code in question is:

df = sqlContext.read.format('jdbc').options(
            url=url, 
            driver='com.ibm.db2.jcc.DB2Driver', 
            dbtable='SAMPLE.ASSETDATA'
        ).load()

There are a few similar questions on stackoverflow, but they aren't asking about the spark service on bluemix.

4条回答
Emotional °昔
2楼-- · 2019-06-24 08:36

Create a new SQLContext object before using sqlContext:

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

and then run the code again.

This error happens if you have multiple notebooks using the out of box sqlContext.

查看更多
ら.Afraid
3楼-- · 2019-06-24 08:43

easily, you can solve this problem as the following : go to the top right corner, tap on you username, you will get a list: 1. choose interpreter 2. scroll the page until you get spark 3. in the right you have a list containing: spark ul, edit , ... choose restart

Go back to your notebook and run again, it must work

查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-06-24 08:46

That statement initializes a HiveContext under the covers. The HiveContext then initializes a local Derby database to hold its metadata. The Derby database is created in the current directory by default. The reported problem occurs under these circumstances (among others):

  1. The Derby database already exists, and there are leftover lock files because the notebook kernel that last used it didn't shut down properly.
  2. The Derby database already exists, and is currently in use by another notebook kernel that also initialized a HiveContext.

Until IBM changes the default setup to avoid this problem, possible workarounds are:

  • For case 1, delete the leftover lockfiles. From a Python notebook, this is done by executing:

    !rm -f ./metastore_db/*.lck
    
  • For case 2, change the current working directory before the Hive context is created. In a Python notebook, this will change into a newly created directory:

    import os
    import tempfile
    os.chdir(tempfile.mkdtemp())
    

    But beware, it will clutter the filesystem with a new directory and Derby database each time you run that notebook.

I happen to know that IBM is working on a fix. Please use these workarounds only if you encounter the problem, not proactively.

查看更多
小情绪 Triste *
5楼-- · 2019-06-24 08:53

Unfortunately, the answer which tells "Create a new SQLContext" is totally wrong.

This is a bad idea to replace sqlContext with a new instance of SQLContext, because you're loosing hive support: by default, sqlContext is initialized with HiveContext.

Second, the message "You must build Spark with Hive. Export 'SPARK_HIVE=true'..." is a badly written code in PySpark (context.py) which doesn't get a correct exception from the java's spark driver and doesn't display it.

To find out what is going on, make so that java driver's log is written to some file. In my case, the customer has Spark with DSE and in the conf directory there are some .xml files named logback-spark*.xml. Open the file which is named logback-spark.xml (without suffixes) and add a file appender there. Then, reproduce the bug and read the exceptions + stack traces written by the Java driver.

For other Spark versions/builds find out first how to get the log for java driver, and set up configuration so that it logs to a file.

In my spark driver's log I had a clear message that Spark's login couldn't write to hive metastore's filesystem. In your case you may get a different message. But the problem in Java is the primary one - you should look at it first.

查看更多
登录 后发表回答