I have a Scala Maven project using that uses Spark, and I am trying implement logging using Logback. I am compiling my application to a jar, and deploying to an EC2 instance where the Spark distribution is installed. My pom.xml includes dependencies for Spark and Logback as follows:
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.1.7</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
When submit my Spark application, I print out the slf4j binding on the command line. If I execute the jars code using java, the binding is to Logback. If I use Spark (i.e. spark-submit), however, the binding is to log4j.
val logger: Logger = LoggerFactory.getLogger(this.getClass)
val sc: SparkContext = new SparkContext()
val rdd = sc.textFile("myFile.txt")
val slb: StaticLoggerBinder = StaticLoggerBinder.getSingleton
System.out.println("Logger Instance: " + slb.getLoggerFactory)
System.out.println("Logger Class Type: " + slb.getLoggerFactoryClassStr)
yields
Logger Instance: org.slf4j.impl.Log4jLoggerFactory@a64e035
Logger Class Type: org.slf4j.impl.Log4jLoggerFactory
I understand that both log4j-1.2.17.jar
and slf4j-log4j12-1.7.16.jar
are in /usr/local/spark/jars, and that Spark is most likely referencing these jars despite the exclusion in my pom.xml, because if I delete them I am given a ClassNotFoundException at runtime of spark-submit.
My question is: Is there a way to implement native logging in my application using Logback while preserving Spark's internal logging capabilities. Ideally, I'd like to write my Logback application logs to a file and allow Spark logs to still be shown at STDOUT.
I packed logback and log4j-to-slf4j along with my other dependencies and src/main/resources/logback.xml in a fat jar.
When I run spark-submit with
all logging is handled by logback.
I had the same problem: I was trying to use a logback config file. I tried many permutation, but I did not get it to work.
I was accessing logback through grizzled-slf4j using this SBT dependency:
Once I added the log4j config file:
my logging worked fine.
I had encountered a very similar problem.
Our build was similar to yours (but we used
sbt
) and is described in detail here: https://stackoverflow.com/a/45479379/1549135Running this solution locally works fine, but then
spark-submit
would ignore all the exclusions and new logging framework (logback
) because spark's classpath has priority over the deployed jar. And since it containslog4j 1.2.xx
it would simply load it and ignore our setup.Solution
I have used several sources. But quoting Spark 1.6.1 docs (applies to Spark latest / 2.2.0 as well):
spark.driver.extraClassPath
spark.executor.extraClassPath
What is not written here, though is that
extraClassPath
takes precedence before default Spark's classpath!So now the solution should be quite obvious.
1. Download those jars:
2. Run the
spark-submit
:I am just not yet sure if you can put those jars on HDFS. We have them locally next to the application jar.
userClassPathFirst
Strangely enough, using
Spark 1.6.1
I have also found this option in docs:spark.driver.userClassPathFirst, spark.executor.userClassPathFirst
But simply setting:
Did not work for me. So I am gladly using
extraClassPath
!Cheers!
Loading
logback.xml
If you face any problems loading
logback.xml
to Spark, my question here might help you out: Pass system property to spark-submit and read file from classpath or custom pathAfter much struggle I've found another solution: library shading. After I've shaded
org.slf4j
, my application logs are separated from spark logs. Furthermore,logback.xml
in my application jar is honored.Here you can find information on library shading in sbt, in this case it comes down to putting:
in your
build.sbt
settings.Side note: If you are not sure whether shading actually happened, open your jar in some archive browser and check whether directory structure reflects shaded one, in this case your jar should contain path
/your_favourite_prefix/org/slf4j
, but not/org/slf4j