Reading huge line of string from text file

2019-01-15 11:15发布

I have a large text file but doesn't have any line break. It just contains a long String (1 huge line of String with all ASCII characters), but so far anything works just fine as I can be able to read the whole line into memory in Java, but i am wondering if there could be a memory leak issue as the file becomes so big like 5GB+ and the program can't read the whole file into memory at once, so in that case what will be the best way to read such file ? Can we break the huge line into 2 parts or even multiple chunks ?

Here's how I read the file

   BufferedReader buf = new BufferedReader(new FileReader("input.txt"));
   String line;
   while((line = buf.readLine()) != null){

   }

6条回答
Bombasti
2楼-- · 2019-01-15 11:56

A single String can be only 2 billion characters long and will use 2 byte per character, so if you could read a 5 GB line it would use 10 GB of memory.

I suggest you read the text in blocks.

Reader reader = new FileReader("input.txt");
try {
    char[] chars = new char[8192];
    for(int len; (len = reader.read(chars)) > 0;) {
        // process chars.
    }
} finally {
    reader.close();
}

This will use about 16 KB regardless of the size of the file.

查看更多
Luminary・发光体
3楼-- · 2019-01-15 12:02

To read chunks from file or write same to some file this could be used:

{
in = new FileReader("input.txt");
out = new FileWriter("output.txt");
char[] buffer = new char[1024];
int l = 0;
while ( (l = in.read(buffer)) > 0 ) {
    out.write(buffer, 0, l);
}
查看更多
虎瘦雄心在
4楼-- · 2019-01-15 12:04

There won't be any kind of memory-leak, as the JVM has its own garbage collector. However you will probably run out of heap space.

In cases like this, it is always best to import and process the stream in manageable pieces. Read in 64MB or so and repeat.

You also might find it useful to add the -Xmx parameter to your java call, in order to increase the maximum heap space available within the JVM.

查看更多
疯言疯语
5楼-- · 2019-01-15 12:06

You won't run into any memory leak issues, but possible heap space issues. To avoid heap issues, use a buffer.

It all depends on how you are currently reading the line. It is possible to avoid all heap issues by using a buffer.

public void readLongString(String superlongString, int size, BufferedReader in){
  char[] buffer = new char[size];
  for(int i=0;i<superlongString.length;i+=size;){
       in.read(buffer, i, size)
       //do stuff 
     }
}
查看更多
甜甜的少女心
6楼-- · 2019-01-15 12:12

its better to read the file in chunks and then concatenate the chunks or do whatever you want wit it, because if it is a big file you are reading you will get heap space issues

an easy way to do it like below

  InputStream is;
  OutputStream os;

  byte buffer[] = new byte[1024];
  int read;
  while((read = is.read(buffer)) != -1)
  {
      // do whatever you need with the buffer
  }
查看更多
ゆ 、 Hurt°
7楼-- · 2019-01-15 12:15

In addition to the idea of reading in chunks, you could also look at memory mapping areas of the file using java.nio.MappedByteBuffer. You would still be limited to a maximum buffer size of Integer.MAX_VALUE. This may be better than explicitly reading chunks if you will be making scattered accesses within a chunk.

查看更多
登录 后发表回答