1.My spark(standalone) cluster: spmaster,spslave1,spslave2
2.For my simple spark app which selects some records from mysql.
public static void main(String[] args) {
SparkConf conf = new SparkConf()
.setMaster("spark://spmaster:7077")
.setAppName("SparkApp")
.set("spark.driver.extraClassPath","/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/mysql-connector-java-5.1.24.jar") //the driver jar was uploaded to all nodes
.set("spark.executor.extraClassPath","/usr/lib/spark-1.6.1-bin-hadoop2.6/lib/mysql-connector-java-5.1.24.jar");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(sc);
String url = "jdbc:mysql://192.168.31.43:3306/mytest";
Map<String, String> options = new HashMap<String, String>();
options.put("url", url);
options.put("dbtable", "mytable");
options.put("user", "root");
options.put("password", "password");
DataFrame jdbcDF = sqlContext.read().format("jdbc").options(options)
.load();
jdbcDF.registerTempTable("c_picrecord");
DataFrame sql = sqlContext.sql("select * from mytable limit 10");
sql.show(); // **show the result on eclipse console**
sc.close();
}
3.My question : when i right click
->run as 'Java Application'
, it works successfully, and i can find the job on webUI<spark://spmaster:7077>
.I don't undersatand how it works , and what is the different between with using spark-submit.sh
.
The spark-submit.sh script is just a wrapper around a ${JAVA_HOME}/bin/java execution command. It sets up the environment details and then runs something like:
When you click on run as 'Java Application' you're also triggering a java execution command, but without all the environment settings done by spark-submit.sh and with the differences mentioned by @Sheel.
When we use
spark-submit.sh
for submitting application, then spark-submit already created Spark Context (aka Driver) by default.But when we use Java API (
JavaSparkContext
) to connect master, then Java application will become Driver. And by using this Driver all application/job will submitted to master.