How to add configuration file to classpath of all

2019-03-18 05:10发布

问题:

I'm using Typesafe Config, https://github.com/typesafehub/config, to parameterize a Spark job running in yarn-cluster mode with a configuration file. The default behavior of Typesafe Config is to search the classpath for resources with names matching a regex and to load them into your configuration class automatically with ConfigFactory.load() (for our purposes, assume the file it looks for is called application.conf).

I am able to load the configuration file into the driver using --driver-class-path <directory containing configuration file>, but using --conf spark.executor.extraClassPath=<directory containing configuration file> does not put the resource on the classpath of all executors like it should. The executors report that they can not find a certain configuration setting for a key that does exist in the configuration file that I'm attempting to add to their classpaths.

What is the correct way to add a file to the classpaths of all executor JVMs using Spark?

回答1:

It looks like the value of the spark.executor.extraClassPath property is relative to the working directory of the application ON THE EXECUTOR.

So, to use this property correctly, one should use --files <configuration file> to first direct Spark to copy the file to the working directory of all executors, then use spark.executor.extraClassPath=./ to add the executor's working directory to its classpath. This combination results in the executor being able to read values from the configuration file.



回答2:

I use the SparkContext addFile method

val sc: SparkContext = {
  val sparkConf = new SparkConf()
    .set("spark.storage.memoryFraction", "0.3")
    .set("spark.driver.maxResultSize", "10g")
    .set("spark.default.parallelism", "1024")
    .setAppName("myproject")
  sparkConf.setMaster(getMaster)
  val sc = new SparkContext(sparkConf)
  sc.addFile("application.conf")
  sc
}