I'm trying to use log4j2 logger in my Spark job. Essential requirement: log4j2 config is located outside classpath, so I need to specify its location explicitly. When I run my code directly within IDE without using spark-submit
, log4j2 works well. However when I submit the same code to Spark cluster using spark-submit
, it fails to find log42 configuration and falls back to default old log4j.
Launcher command
${SPARK_HOME}/bin/spark-submit \
--class my.app.JobDriver \
--verbose \
--master 'local[*]' \
--files "log4j2.xml" \
--conf spark.executor.extraJavaOptions="-Dlog4j.configurationFile=log4j2.xml" \
--conf spark.driver.extraJavaOptions="-Dlog4j.configurationFile=log4j2.xml" \
myapp-SNAPSHOT.jar
Log4j2 dependencies in maven
<dependencies>
. . .
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j2.version}</version>
</dependency>
<!-- Bridge log4j to log4j2 -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
<version>${log4j2.version}</version>
</dependency>
<!-- Bridge slf4j to log4j2 -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependencies>
Any ideas what I could miss?
If log4j2 is being used in one of your own dependencies, it's quite easy to bipass all configuration files and use programmatic configuration for one or two high level loggers IF and only IF the configuration file is not found.
The code below does the trick. Just name the logger to your top level logger.
Try using the --driver-java-options
Spark falls back to log4j because it probably cannot initialize logging system during startup (your application code is not added to classpath).
If you are permitted to place new files on your cluster nodes then create directory on all of them (for example
/opt/spark_extras
), place there all log4j2 jars and add two configuration options to spark-submit:Then libraries will be added to classpath.
If you have no access to modify files on cluster you can try another approach. Add all log4j2 jars to spark-submit parameters using
--jars
. According to the documentation all these libries will be added to driver's and executor's classpath so it should work in the same way.Apparently at the moment there is no official support official for log4j2 in Spark. Here is detailed discussion on the subject: https://issues.apache.org/jira/browse/SPARK-6305
On practical side that means:
If you have access to Spark configs and jars and can modify them, you still can use
log4j2
after manually adding log4j2 jars to SPARK_CLASSPATH, and providinglog4j2
configuration file to Spark.If you run on managed Spark cluster and have no access to Spark jars/configs, then you still can use
log4j2
, however its use will be limited to the code executed at driver side. Any code part running by executors will use Spark executors logger (which is old log4j)