I currently have 2 BufferedReader
s initialized on the same text file. When I'm done reading the text file with the first BufferedReader
, I use the second one to make another pass through the file from the top. Multiple passes through the same file are necessary.
I know about reset()
, but it needs to be preceded with calling mark()
and mark()
needs to know the size of the file, something I don't think I should have to bother with.
Ideas? Packages? Libs? Code?
Thanks TJ
The best way to proceed is to change your algorithm, in a way in which you will NOT need the second pass. I used this approach a couple of times, when I had to deal with huge (but not terrible, i.e. few GBs) files which didn't fit the available memory.
It might be hard, but the performance gain usually worths the effort
What's the disadvantage of just creating a new
BufferedReader
to read from the top? I'd expect the operating system to cache the file if it's small enough.If you're concerned about performance, have you proved it to be a bottleneck? I'd just do the simplest thing and not worry about it until you have a specific reason to. I mean, you could just read the whole thing into memory and then do the two passes on the result, but again that's going to be more complicated than just reading from the start again with a new reader.
About mark/reset:
The mark method in BufferedReader takes a readAheadLimit parameter which limits how far you can read after a mark before reset becomes impossible. Resetting doesn't actually mean a file system seek(0), it just seeks inside the buffer. To quote the Javadoc:
"The whole business about mark() and reset() in BufferedReader smacks of poor design."
why don't you extend this class and have it do a mark() in the constructor() and then do a seek(0) in topOfFile() method.
BR,
~A
The Buffered readers are meant to read a file sequentially. What you are looking for is the java.io.RandomAccessFile, and then you can use
seek()
to take you to where you want in the file.The random access reader is implemented like so:
The
"rw"
is a mode character which is detailed here.The reason the sequential access readers are setup like this is so that they can implement their buffers and that things can not be changed beneath their feet. For example the file reader that is given to the buffered reader should only be operated on by that buffered reader. If there was another location that could affect it you could have inconsistent operation as one reader advanced its position in the file reader while the other wanted it to remain the same now you use the other reader and it is in an undetermined location.