I am using
df.write.mode("append").jdbc("jdbc:mysql://ip:port/database", "table_name", properties)
to insert into a table in MySQL.
Also, I have added Class.forName("com.mysql.jdbc.Driver")
in my code.
When I submit my Spark application:
spark-submit --class MY_MAIN_CLASS
--master yarn-client
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
This yarn-client mode works for me.
But when I use yarn-cluster mode:
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
It doens't work. I also tried setting "--conf":
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
--conf spark.executor.extraClassPath=/path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
but still get the "No suitable driver found for jdbc" error.
There is 3 possible solutions,
- You might want to assembly you application with your build manager (Maven,SBT) thus you'll not need to add the dependecies in your
spark-submit
cli.
You can use the following option in your spark-submit
cli :
--jars $(echo ./lib/*.jar | tr ' ' ',')
Explanation : Supposing that you have all your jars in a lib
directory in your project root, this will read all the libraries and add them to the application submit.
You can also try to configure these 2 variables : spark.driver.extraClassPath
and spark.executor.extraClassPath
in SPARK_HOME/conf/spark-default.conf
file and specify the value of these variables as the path of the jar file. Ensure that the same path exists on worker nodes.
I tried the suggestions shown here which didn't work for me (with mysql). While debugging through the DriverManager code, I realized that I needed to register my driver since this was not happening automatically with "spark-submit". I therefore added
Driver driver = new Driver();
The constructor registers the driver with the DriverManager, which solved the SQLException problem for me.
I had to add the driver
option when using the sparkSession
's read
function.
.option("driver", "org.postgresql.Driver")
var jdbcDF - sparkSession.read
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://<host>:<port>/<DBName>")
.option("dbtable", "<tableName>")
.option("user", "<user>")
.option("password", "<password>")
.load()
Depending on how your dependencies are setup, you'll notice that when you include something like compile group: 'org.postgresql', name: 'postgresql', version: '42.2.8'
in Gradle, for example, this will include the Driver class at org/postgresql/Driver.class
, and that's the one you want to instruct spark to load.