I am trying to run a spark program where i have multiple jar files, if I had only one jar I am not able run. I want to add both the jar files which are in same location. I have tried the below but it shows a dependency error
spark-submit \
--class "max" maxjar.jar Book1.csv test \
--driver-class-path /usr/lib/spark/assembly/lib/hive-common-0.13.1-cdh5.3.0.jar
How can i add another jar file which is in the same directory?
I want add /usr/lib/spark/assembly/lib/hive-serde.jar
.
Just use the
--jars
parameter. Spark will share those jars (comma-separated) with the executors.Specifying full path for all additional jars works.
Or add jars in conf/spark-defaults.conf by adding lines like:
You can use * for import all jars into a folder when adding in conf/spark-defaults.conf .
For
--driver-class-path
option you can use:
as delimeter to pass multiple jars. Below is the example withspark-shell
command but I guess the same should work withspark-submit
as wellSpark version: 2.2.0
I was trying to connect to mysql from the python code that was executed using
spark-submit
.I was using HDP sandbox that was using Ambari. Tried lot of options such as
--jars
,--driver-class-path
, etc, but none worked.Solution
Copy the jar in
/usr/local/miniconda/lib/python2.7/site-packages/pyspark/jars/
As of now I'm not sure if it's a solution or a quick hack, but since I'm working on POC so it kind of works for me.
For me --jars option always works but it's too verbose. To save some typing, you can put all jars in a directory say 'myJars' and then use this command to submit: