I have a Spark (Spark 1.5.2) application that streams data from Kafka to HDFS. My application contains two Typesafe config files to configure certain things like Kafka topic etc.
Now I want to run my application with spark-submit (cluster mode) in a cluster. The jar file with all dependencies of my project is stored on HDFS. As long as my config files are included in the jar file everything works fine. But this is unpractical for testing purposes because I always have to rebuild the jar.
Therefore I excluded the config files of my project and I added them via "driver-class-path". This worked on client mode but if I move the config files now to HDFS and run my application in cluster mode it can't find the settings. Below you can find my spark-submit command:
/usr/local/spark/bin/spark-submit \
--total-executor-cores 10 \
--executor-memory 15g \
--verbose \
--deploy-mode cluster\
--class com.hdp.speedlayer.SpeedLayerApp \
--driver-class-path hdfs://iot-master:8020/user/spark/config \
--master spark://spark-master:6066 \
hdfs://iot-master:8020/user/spark/speed-layer-CONFIG.jar
I already tried it with the --file parameter but that also didn't work. Does anybody know how I can fix this?
Update:
I did some further research and I figured out that it could be related to the HDFS path. I changed the HDFS path to "hdfs:///iot-master:8020//user//spark//config But unfortunately that also that didn't work. But maybe this could help you.
Below you can also see the error I get when I run the driver program in cluster mode:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ExceptionInInitializerError
at com.speedlayer.SpeedLayerApp.main(SpeedLayerApp.scala)
... 6 more
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'application'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
...
One option is to use the
--files
flag and with the HDFS location and make sure you add it to your executor classpath using thespark.executor.extraClassPath
flag with-Dconfig.file
:Also, you can see it when looking at the help documentation for
spark-submit
:Running with spark-submit:
Trying to achieve the same result I found out the following:
conf.addFile()
. so hdfs files wont work unless you are able to runhdfs dfs -get <....>
before to retrieve the file. in my case I want to run it from oozie so I dont know on which machine its going to run and I dont want to add a copy file action to my workflow.conf.addJar()
So as far as I know there is no strait way to load configuration file from hdfs.
My approach was to pass the path to my app and read the configuration file and merge it into reference file:
P.S the error only means that the ConfigFactory didnt find any configuration file, so he couldn't find the property you are looking for.