I am having weird behavior with Scanner. It will work with a particular set of files I am using when I use the Scanner(FileInputStream)
constructor, but it won't with the Scanner(File)
constructor.
Case 1: Scanner(File)
Scanner s = new Scanner(new File("file"));
while(s.hasNextLine()) {
System.out.println(s.nextLine());
}
Result: no output
Case 2: Scanner(FileInputStream)
Scanner s = new Scanner(new FileInputStream(new File("file")));
while(s.hasNextLine()) {
System.out.println(s.nextLine());
}
Result: the file content outputs to the console.
The input file is a java file containing a single class.
I double checked programmatically (in Java) that:
- the file exists,
- is readable,
- and has a non-zero filesize.
Typically Scanner(File)
works for me in this case, I am not sure why it doesn't now.
hasNextLine() calls findWithinHorizon() which in turns calls findPatternInBuffer(), searching a match for a line terminator character pattern defined as .*(\r\n|[\n\r\u2028\u2029\u0085])|.+$
Strange thing is that with both ways to construct a Scanner (with FileInputStream or via File), findPatternInBuffer returns a positive match if the file contains (independently from file size) for instance the 0x0A line terminator; but in the case the file contains a character out of ascii (ie >= 7f), using FileInputStream returns true while using File returns false.
Very simple test case:
create a file which contains just char "a"
# hexedit file
00000000 61 0A a.
# java Test.java
using File: true
using FileInputStream: true
now edit the file with hexedit to:
# hexedit file
00000000 61 0A 80 a..
# java Test.java
using File: false
using FileInputStream: true
in the test java code there is nothing else than what already in the question:
import java.io.*;
import java.lang.*;
import java.util.*;
public class Test {
public static void main(String[] args) {
try {
File file1 = new File("file");
Scanner s1 = new Scanner(file1);
System.out.println("using File: "+s1.hasNextLine());
File file2 = new File("file");
Scanner s2 = new Scanner(new FileInputStream(file2));
System.out.println("using FileInputStream: "+s2.hasNextLine());
} catch (IOException e) {
e.printStackTrace();
}
}
}
SO, it turns out this is a charset issue. In facts, changing the test to:
Scanner s1 = new Scanner(file1, "latin1");
we get:
# java Test
using File: true
using FileInputStream: true
From looking at the Oracle/Sun JDK's 1.6.0_23 implementation of Scanner, the Scanner(File)
constructor invokes a FileInputStream
, which is meant for raw binary data.
This points to a difference in buffering and parsing technique used when invoking one constructor or another, which will directly impact your code on the call to hasNextLine()
.
Scanner(InputStream)
uses an InputStreamReader
while Scanner(File)
uses an InputStream
passed to a ByteChannel
(and probably reads the whole file in one jump, thus advancing the cursor, in your case).