可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a large text file but doesn't have any line break. It just contains a long String (1 huge line of String with all ASCII characters), but so far anything works just fine as I can be able to read the whole line into memory in Java, but i am wondering if there could be a memory leak issue as the file becomes so big like 5GB+ and the program can't read the whole file into memory at once, so in that case what will be the best way to read such file ? Can we break the huge line into 2 parts or even multiple chunks ?
Here's how I read the file
BufferedReader buf = new BufferedReader(new FileReader("input.txt"));
String line;
while((line = buf.readLine()) != null){
}
回答1:
A single String can be only 2 billion characters long and will use 2 byte per character, so if you could read a 5 GB line it would use 10 GB of memory.
I suggest you read the text in blocks.
Reader reader = new FileReader("input.txt");
try {
char[] chars = new char[8192];
for(int len; (len = reader.read(chars)) > 0;) {
// process chars.
}
} finally {
reader.close();
}
This will use about 16 KB regardless of the size of the file.
回答2:
There won't be any kind of memory-leak, as the JVM has its own garbage collector. However you will probably run out of heap space.
In cases like this, it is always best to import and process the stream in manageable pieces. Read in 64MB or so and repeat.
You also might find it useful to add the -Xmx
parameter to your java
call, in order to increase the maximum heap space available within the JVM.
回答3:
its better to read the file in chunks and then concatenate the chunks or do whatever you want wit it, because if it is a big file you are reading you will get heap space issues
an easy way to do it like below
InputStream is;
OutputStream os;
byte buffer[] = new byte[1024];
int read;
while((read = is.read(buffer)) != -1)
{
// do whatever you need with the buffer
}
回答4:
In addition to the idea of reading in chunks, you could also look at memory mapping areas of the file using java.nio.MappedByteBuffer. You would still be limited to a maximum buffer size of Integer.MAX_VALUE. This may be better than explicitly reading chunks if you will be making scattered accesses within a chunk.
回答5:
To read chunks from file or write same to some file this could be used:
{
in = new FileReader("input.txt");
out = new FileWriter("output.txt");
char[] buffer = new char[1024];
int l = 0;
while ( (l = in.read(buffer)) > 0 ) {
out.write(buffer, 0, l);
}
回答6:
You won't run into any memory leak issues, but possible heap space issues. To avoid heap issues, use a buffer.
It all depends on how you are currently reading the line. It is possible to avoid all heap issues by using a buffer.
public void readLongString(String superlongString, int size, BufferedReader in){
char[] buffer = new char[size];
for(int i=0;i<superlongString.length;i+=size;){
in.read(buffer, i, size)
//do stuff
}
}