I'm building an Apache Spark Streaming application and cannot make it log to a file on the local filesystem when running it on YARN. How can achieve this?
I've set log4.properties
file so that it can successfully write to a log file in /tmp
directory on the local file system (shown below partially):
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.file.File=/tmp/application.log
log4j.appender.file.append=false
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
When I run my Spark application locally by using the following command:
spark-submit --class myModule.myClass --master local[2] --deploy-mode client myApp.jar
It runs fine and I can see that log messages are written to /tmp/application.log
on my local file system.
But when I run the same application via YARN, e.g.
spark-submit --class myModule.myClass --master yarn-client --name "myModule" --total-executor-cores 1 --executor-memory 1g myApp.jar
or
spark-submit --class myModule.myClass --master yarn-cluster --name "myModule" --total-executor-cores 1 --executor-memory 1g myApp.jar
I cannot see any /tmp/application.log
on the local file system of the machine that runs YARN.
What am I missing.
Alternatively, you can use PropertyConfigurator of log4j to define your custom log properties.
Ex.
Your properties file shall have the following props,
EDIT: Updating link to log4j docs. Spark uses log4j 2, not v1.2
Ref : http://logging.apache.org/log4j/2.x/
[Edited to avoid confusion]
It looks like you'll need to append to the JVM arguments used when launching your tasks/jobs.
Try editing
conf/spark-defaults.conf
as described hereAlternatively try editing
conf/spark-env.sh
as described here to add the same JVM argument, although the entries in conf/spark-defaults.conf should work.If you are still not getting any joy, you can explicitly pass the location of your log4j.properties file on the command line along with your
spark-submit
like this if the file is contained within your JAR file and in the root directory of your classpathIf the file is not on your classpath use the
file:
prefix and full path like thisThe above options of specifying the log4j.properties using spark.executor.extraJavaOptions, spark.driver.extraJavaOptions would only log it locally and also the log4.properties should be present locally on each node.
As specified in the https://spark.apache.org/docs/1.2.1/running-on-yarn.html documentation, you could alternatively upload log4j.properties along with your applicaiton using --files option. This would do yarn aggregate logging on HDFS and you can access the log using the command
1) To debug how Spark on YARN is interpreting your log4j settings, use
log4j.debug
flag.2) Spark will create 2 kind of YARN containers, the driver and the worker. So you want to share a file from where you submit the application with all containers (you cant use a file inside the JAR, since this is not the JAR that really run), so you must use the
--files
Spark submit directive (this will share file with all workers).Like this:
Where log4j.properties is a project file inside
src/main/resources/config
folder.I can see in the console:
So the file is taken in account, you can check on Spark webUI too.
In you log4j.properties file, you should also modify the
log4j.rootCategory
fromINFO,console
toINFO,file
.