Weird behavior with java Scanner reading files

2020-08-24 11:38发布

问题:

So, I just ran into an interesting problem while using the Scanner class to read contents from files. Basically, I'm trying to read several output files produced by a parsing application from a directory to compute some accuracy metrics.

Basically, my code just walks through each of the files in the directory, and opens them up with a scanner to process the contents. For whatever reason, a few of the files (all UTF-8 encoded) were not being read by the Scanner. Even though the files were not empty, scanner.hasNextLine() would return false upon its first call (I opened up the debugger and observed this). I was initializing the scanner directly with the File objects each time (the file Objects were successfully created). i.e:

    File file = new File(pathName);
    ...
    Scanner scanner = new Scanner(file);

I tried a couple of things, and was eventually able to fix this problem by initializing the scanner in the following way:

    Scanner scanner = new Scanner(new FileInputStream(file));

Though I'm happy to have solved the problem, I'm still curious as to what might have been happening to cause the problem before. Any ideas? Thanks so much!

回答1:

According to the Scanner.java source in Java 6u23 a new line is detected by

private static final String LINE_SEPARATOR_PATTERN = 
                                       "\r\n|[\n\r???]";
private static final String LINE_PATTERN = ".*("+LINE_SEPARATOR_PATTERN+")|.+$";

So you could check whether you can match the following regex to the content in the files that were not read.

.*(\r\n|[\n\r???])|.+$

Also I would check if there were some exception raised.

UPDATE: This made me curious and I looked for answers. Seems your question has been asked and solved already here: Java Scanner(File) misbehaving, but Scanner(FIleInputStream) always works with the same file

To summarize it's about characters that are out of ASCII, that are behaving differently depending on whether you initialize your Scanner with File or FileInputStream.



回答2:

I would try to check if you always close the scanner after reading the file. Also do you call only hasNextLine() and nextLine(), or do you call another nextXXX() method on that scanners?



标签: java