How to suppress parquet log messages in Spark?

How to stop such messages from coming on my spark-shell console.

5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 89213 records.
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 120141
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 89213
5 May, 2015 5:14:30 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutp
[Stage 12:=================================================>    (184 + 4) / 200]

Thanks

标签： logging apache-spark parquet

6条回答

傲

2楼-- · 2019-02-11 18:00

This will work for Spark 2.0. Edit file spark/log4j.properties and add:

log4j.logger.org.apache.spark.sql.execution.datasources.parquet=ERROR
log4j.logger.org.apache.spark.sql.execution.datasources.FileScanRDD=ERROR
log4j.logger.org.apache.hadoop.io.compress.CodecPool=ERROR

The lines for FileScanRDD and CodecPool will help with a couple of logs that are very verbose as well.

0人赞添加讨论(0) 举报

傲

3楼-- · 2019-02-11 18:09

I believe this regressed --there are some large merges/changes they are making to the parquet integration...https://issues.apache.org/jira/browse/SPARK-4412

0人赞添加讨论(0) 举报

贪生不怕死

4楼-- · 2019-02-11 18:09

To turn off all the messages except ERROR, you shoud edit your conf/log4j.properties file changing the following line:

log4j.rootCategory=INFO, console

into

log4j.rootCategory=ERROR, console

Hope it could help!

0人赞添加讨论(0) 举报

可以哭但决不认输i

5楼-- · 2019-02-11 18:10

The solution from SPARK-8118 issue comment seem to work:

You can disable the chatty output by creating a properties file with these contents:

org.apache.parquet.handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level=SEVERE

And then passing the path of the file to Spark when the application is submitted. Assuming the file lives in /tmp/parquet.logging.properties (of course, that needs to be available on all worker nodes):

spark-submit \
     --conf spark.driver.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \`
      --conf spark.executor.extraJavaOptions="-Djava.util.logging.config.file=/tmp/parquet.logging.properties" \
      ...

Credits go to Justin Bailey.

0人赞添加讨论(0) 举报

一夜七次

6楼-- · 2019-02-11 18:12

not a solution but if you build your own spark then this file: https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java has most the generations of log messages which you can comment out for now.

0人赞添加讨论(0) 举报

欢心

7楼-- · 2019-02-11 18:27

I know this question was WRT Spark, but I recently had this issue when using Parquet with Hive in CDH 5.x and found a work-around. Details are here: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403

Contents of my comment from that JIRA ticket below:

This is also an issue in the version of parquet distributed in CDH 5.x. In this case, I am using parquet-1.5.0-cdh5.8.4 (sources available here: http://archive.cloudera.com/cdh5/cdh/5)

However, I've found a work-around for mapreduce jobs submitted via Hive. I'm sure this can be adapted for use with Spark as well.

Add the following properties to your job's configuration (in my case, I added them to hive-site.xml since adding them to mapred-site.xml didn't work:
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
<property>
  <name>mapreduce.child.java.opts</name>
  <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
</property>
Create a file named parquet-logging.properties with the following contents:
# Note: I'm certain not every line here is necessary. I just added them to cover all possible
# class/facility names.you will want to tailor this as per your needs.
.level=WARNING
java.util.logging.ConsoleHandler.level=WARNING

parquet.handlers=java.util.logging.ConsoleHandler
parquet.hadoop.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler

parquet.level=WARNING
parquet.hadoop.level=WARNING
org.apache.parquet.level=WARNING
org.apache.parquet.hadoop.level=WARNING
Add the file to the job. In Hive, this is most easily done like so:
ADD FILE /path/to/parquet-logging.properties;
With this done, when you run your Hive queries, parquet should only log WARNING (and higher) level messages to the stdout container logs.

0人赞添加讨论(0) 举报

How to suppress parquet log messages in Spark?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间