I'm launching my spark-based hiveserver2 on Amazon EMR, which has an extra classpath dependency. Due to this bug in Amazon EMR:
https://petz2000.wordpress.com/2015/08/18/get-blas-working-with-spark-on-amazon-emr/
My classpath cannot be submitted through "--driver-class-path" option
So I'm bounded to modify /etc/spark/conf/spark-env.conf to add the extra classpath:
# Add Hadoop libraries to Spark classpath
SPARK_CLASSPATH="${SPARK_CLASSPATH}:${HADOOP_HOME}/*:${HADOOP_HOME}/../hadoop-hdfs/*:${HADOOP_HOME}/../hadoop-mapreduce/*:${HADOOP_HOME}/../hadoop-yarn/*:/home/hadoop/git/datapassport/*"
where "/home/hadoop/git/datapassport/*" is my classpath.
However after launching the server successfully, the Spark environment parameter shows that my change is ineffective:
spark.driver.extraClassPath :/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
Is this configuration file obsolete? Where is the new file and how to fix this problem?
Have you tried setting
spark.driver.extraClassPath
inspark-defaults
? Something like this:You can use the --driver-classpath.
Start a spark-shell on the master node from a fresh EMR cluster.
Add your JAR files to the EMR cluster using a --bootstrap-action.
When you call spark-submit prepend (or append) your JAR files to the value of extraClassPath you got from spark-shell
This worked for me using EMR release builds 4.1 and 4.2.
The process for building spark.driver.extraClassPath may change between releases, which may be the reason why SPARK_CLASSPATH doesn't work anymore.