Loading Spark Config for testing Spark Application

2019-08-30 02:02发布

I've been trying to test a spark application on my local laptop before deploying it to a cluster (to avoid having to package and deploy my entire application every time) but struggling on loading the spark config file.

When I run my application on a cluster, I am usually providing a spark config file to the application (using spark-submit's --conf). This file has a lot of config options because this application interacts with Cassandra and HDFS. However, when I try to do the same on my local laptop, I'm not sure exactly how to load this config file. I know I can probably write a piece of code that takes the file path of the config file and just goes through and parses all the values and sets them in the config, but I'm just wondering if there are easier ways.

Current status:

  • I placed the desired config file in the my SPARK_HOME/conf directory and called it spark-defaults.conf ---> This didn't get applied, however this exact same file runs fine using spark-submit
  • For local mode, when I create the spark session, I'm setting Spark Master as "local[2]". I'm doing this when creating the spark session, so I'm wondering if it's possible to create this session with a specified config file.

2条回答
家丑人穷心不美
2楼-- · 2019-08-30 02:37

Not sure if this will help anyone, but I ended up reading the conf file from a test resource directory and then setting all the values as system properties (copied this from Spark Source Code):

//_sparkConfs is just a map of (String,String) populated from reading the conf file
for {
  (k, v) ← _sparkConfs
} {
  System.setProperty(k, v)
}

This is essentially emulating the --properties-file option of spark-submit to a certain degree. By doing this, I was able to keep this logic in my test setup, and not need to modify the existing application code.

查看更多
做自己的国王
3楼-- · 2019-08-30 02:49

Did you added --properties-file flag with spark-defaults.conf value in your IDE as an argument for JVM?

In official documentation (https://spark.apache.org/docs/latest/configuration.html) there is continuous reference to 'your default properties file'. Some options can not be set inside your application, because the JVM has already started. And since conf directory is read only through spark-submit, I suppose you have to explicitly load configuration file when running locally.

This problem has been discussed here: How to use spark-submit's --properties-file option to launch Spark application in IntelliJ IDEA?

查看更多
登录 后发表回答