OutOfMemory error when using Apache Commons lineIt

2019-07-04 06:06发布

I'm trying to iterate line-by-line a 1.2GB file using Apache Commons FileUtils.lineIterator. However, as soon as a LineIterator calls hasNext() I get a java.lang.OutOfMemoryError: Java heap space. I've already allocated 1G to the java heap.

What am I doing wrong in here? After reading some docs, isn't LineIterator supposed to be reading the file from the file system and not loading it into memory?

Note the code is in Scala:

  val file = new java.io.File("data_export.dat")
  val it = org.apache.commons.io.FileUtils.lineIterator(file, "UTF-8")
  var successCount = 0L
  var totalCount = 0L
  try {
    while ( {
      it.hasNext()
    }) {
      try {
        val legacy = parse[LegacyEvent](it.nextLine())
        BehaviorEvent(legacy)
        successCount += 1L
      } catch {
        case e: Exception => println("Parse error")
      }
      totalCount += 1
    }
  } finally {
    it.close()
  }

Thanks for your help here!

1条回答
仙女界的扛把子
2楼-- · 2019-07-04 06:39

The code looks good. Probably it does not find an end of a line in the file and reads a very long line which is larger than 1Gb into memory.

Try wc -l in Unix and see how many lines you get.

查看更多
登录 后发表回答