I'd like to iterate through a text file one line at a time, operate on the contents, and stream the result to a separate file. Textbook case for BufferedReader.readLine()
.
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
But: suppose my input file doesn't have a trailing newline. I'd like to keep things how they were. Now I need to peek ahead to the next line ending before reading every line. At this point why am I using a class that gives me readLine()
at all?
This seems like it should be a solved problem. Is there a library (or even better, core Java7 class!) that will just let me call a method similar to readLine()
that returns one line of text from a stream, with the EOL character(s) intact?
Here's an implementation that reads char by char until it finds a line terminator. The reader passed in must support mark()
, so if yours doesn't, wrap it in a BufferedReader
.
public static String readLineWithTerm(Reader reader) throws IOException {
if (! reader.markSupported()) {
throw new IllegalArgumentException("reader must support mark()");
}
int code;
StringBuilder line = new StringBuilder();
while ((code = reader.read()) != -1) {
char ch = (char) code;
line.append(ch);
if (ch == '\n') {
break;
} else if (ch == '\r') {
reader.mark(1);
ch = (char) reader.read();
if (ch == '\n') {
line.append(ch);
} else {
reader.reset();
}
break;
}
}
return (line.length() == 0 ? null : line.toString());
}
Update:
But: I need to glue my lines together with newlines, and what if the original file didn't have the "right" newlines for my platform (DOS files on Linux or vice versa)? I guess I could read ahead a bit in the stream and see what kind of line endings I find, even though that's really hacky.
You can create a BufferedReader with a specified charset. So if the file is wacky, you'll have to supply the file's charset. Files.newBufferedReader(Path p, Charset cs)
Is there a library (or even better, core Java7 class!) that will just
let me call a method similar to readLine() that returns one line of
text from a stream, with the EOL character(s) intact?
If you're going to read a file, you have to know what charset it is. If you know what charset it is, then you don't need the EOL character to be "intact" since you can just add it on yourself.
From BufferedReader.readLine:
Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached
So BufferedReader.readLine
does not return any line-termination characters. If you want to preserve these characters, you can use the read method instead.
int size = 1000; // size of file
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
char[] buf = new char[size];
br.read(buf, 0, size);
That is just a simple example, but if the file has line termination then it will show up in the buffer.
You should be using the StreamTokenizer to get more detailed control over input pasring.
http://docs.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.html