I can't find the documentation that specifies how a Scanner treats newline patterns by default. I want to read a file line by line and have the scanner be able to handle \r, \n or \r\n line endings regardless of the system the program is actually running on.
If I declare a scanner like so:
Scanner scanner = new Scanner(reader);
what is the default behaviour? Will it handle all three kinds as described above or do I have to tell it explicitly to do it?
Looking at the source code for Sun JDK 1.6, the pattern used is "\r\n|[\n\r\u2028\u2029\u0085]"
which says "\r\n" or any one of \r, \n or the unicode characters for "line separator", "paragraph separator", and "next line" respectively.
It is not documented (in Java 1.6) but the JDK code uses this regex to match a line break:
Here's a link to the source code: http://cr.openjdk.java.net/~briangoetz/7012540/webrev/src/share/classes/java/util/Scanner.java.html
IMO, this ought to be specified, since
Scanner
's behavior wrt to line separators is different to (for example)BufferedReader
's. (I've lodged a bug report ...)