可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm running Spark on EMR as described in Run Spark and Spark SQL on Amazon Elastic MapReduce:

This tutorial walks you through installing and operating Spark, a fast and general engine for large-scale data processing, on an Amazon EMR cluster. You will also create and query a dataset in Amazon S3 using Spark SQL, and learn how to monitor Spark on an Amazon EMR cluster with Amazon CloudWatch.

I'm trying to suppress the INFO logs by editing $HOME/spark/conf/log4j.properties to no avail.

Output looks like:

$ ./spark/bin/spark-sql
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/spark-1.1.1.e/lib/spark-assembly-1.1.1-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-12-14 20:59:01,819 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-12-14 20:59:01,825 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2014-12-14 20:59:01,825 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-12-14 20:59:01,825 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack

How to suppress the INFO messages above?

回答1:

You can also just add the configuration option at cluster creation, if you know you want to suppress logging for a new EMR cluster.

EMR accepts configuration options as JSON, which you can enter directly into the AWS console, or pass in via a file when using the CLI.

In this case, in order to change the log level to WARN, here's the JSON:

[
  {
    "classification": "spark-log4j",
    "properties": {"log4j.rootCategory": "WARN, console"}
  }
]

In the console, you'd add this in the first creation step:

Or if you're creating the cluster using the CLI:

aws emr create-cluster <options here> --configurations config_file.json

You can read more in the EMR documentation.

回答2:

I was able to do this by editing $HOME/spark/conf/log4j.properties as desired, and calling spark-sql with --driver-java-options as follows:

./spark/bin/spark-sql --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/spark/conf/log4j.properties"

I could verify that the correct file was being used by adding -Dlog4j.debug to the options:

./spark/bin/spark-sql --driver-java-options "-Dlog4j.debug -Dlog4j.configuration=file:///home/hadoop/spark/conf/log4j.properties"

回答3:

spark-sql --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/conf/log4j.properties"

cat conf/log4j.properties

# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN