How to stop such messages from coming on my spark-shell console.
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 89213 records.
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 120141
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
5 May, 2015 5:14:30 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 89213
5 May, 2015 5:14:30 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutp
[Stage 12:=================================================> (184 + 4) / 200]
Thanks
This will work for Spark 2.0. Edit file spark/log4j.properties and add:
The lines for FileScanRDD and CodecPool will help with a couple of logs that are very verbose as well.
I believe this regressed --there are some large merges/changes they are making to the parquet integration...https://issues.apache.org/jira/browse/SPARK-4412
To turn off all the messages except ERROR, you shoud edit your conf/log4j.properties file changing the following line:
into
Hope it could help!
The solution from SPARK-8118 issue comment seem to work:
Credits go to Justin Bailey.
not a solution but if you build your own spark then this file: https://github.com/Parquet/parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetFileReader.java has most the generations of log messages which you can comment out for now.
I know this question was WRT Spark, but I recently had this issue when using Parquet with Hive in CDH 5.x and found a work-around. Details are here: https://issues.apache.org/jira/browse/SPARK-4412?focusedCommentId=16118403
Contents of my comment from that JIRA ticket below: