I am trying to run SparkSQL :
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
But the error i m getting is below:
... 125 more
Caused by: java.sql.SQLException: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
... 122 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
I see there is a metastore_db folder exists..
My hive metastore includes mysql as metastore.But not sure why the error shows as derby execption
Another case where you can see the same error is a Spark REPL of an AWS Glue dev endpoint, when you are trying to convert a dynamic frame into a dataframe.
There are actually several different exceptions like:
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':"
ERROR XSDB6: Another instance of Derby may have already booted the database /home/glue/metastore_db.
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader
The solution is hard to find with google but eventually it is described here.
The loaded REPL contains an instantiated
SparkSession
in a variablespark
and you just need to stop it before creating a newSparkContext
:I was facing the same issue while creating table.
I could see many entries for
ps -ef | grep spark-shell
so I killed all of them and restartedspark-shell
. It worked for me.If you're running in spark shell, you shouldn't instantiate a HiveContext, there's one created automatically called
sqlContext
(the name is misleading - if you compiled Spark with Hive, it will be a HiveContext). See similar discussion here.If you're not running in shell - this exception means you've created more than one HiveContext in the same JVM, which seems to be impossible - you can only create one.
This happened when I was using pyspark ml Word2Vec. I was trying to load previously built model. Trick is, just create empty data frame of pyspark or scala using sqlContext. Following is the python syntax -
This is a workaround. My problem fixed after using this block. Note - It only occurs when you instantiate sqlContext from HiveContext, not SQLContext.
its very difficult to find where your derby metastore_db is access by another thread, if you are able to find the process then you can kill it using kill command.
Best solutions to restart the system.
I got this error by running
sqlContext._get_hive_ctx()
This was caused by initially trying to load a pipelined RDD into a dataframe I got the errorException: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o29))
So you could running this before rebuilding it, but FYI I have seen others reporting this did not help them.