可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm using java.util.Scanner to read file contents from classpath with this code:

String path1 = getClass().getResource("/myfile.html").getFile();

System.out.println(new File(path1).length()); // 22244 (correct)

String file1 = new Scanner(new File(path1)).useDelimiter("\\Z").next();
System.out.println(file1.length()); // 2048 (first 2k only)

Code runs from idea with command (maven test)

/Library/Java/JavaVirtualMachines/jdk1.7.0_25.jdk/Contents/Home/bin/java -Dmaven.home=/usr/share/java/maven-3.0.4 -Dclassworlds.conf=/usr/share/java/maven-3.0.4/bin/m2.conf -Didea.launcher.port=7533 "-Didea.launcher.bin.path=/Applications/IntelliJ IDEA 12 CE.app/bin" -Dfile.encoding=UTF-8 -classpath "/usr/share/java/maven-3.0.4/boot/plexus-classworlds-2.4.jar:/Applications/IntelliJ IDEA 12 CE.app/lib/idea_rt.jar" com.intellij.rt.execution.application.AppMain org.codehaus.classworlds.Launcher --fail-fast --strict-checksums test

It was running perfectly on my win7 machine. But after I moved to mac same tests fail. I tried to google but didn't find much =(

Why Scanner with delimiter \Z read my whole file into a string on win7 but won't do it on mac? I know there're more ways to read a file, but I like this one-liner and want to understand why it's not working. Thanks.

回答1:

Here is some info from java about it

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

\Z The end of the input but for the final terminator, if any

\z The end of the input

Line terminators

A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

A newline (line feed) character ('\n'), A carriage-return character followed immediately by a newline character ("\r\n"), A standalone carriage-return character ('\r'), A next-line character ('\u0085'), A line-separator character ('\u2028'), or A paragraph-separator character ('\u2029).

So use \z instead of \Z

回答2:

There is a good article about this method of entirely reading file with Scanner:

http://closingbraces.net/2011/12/17/scanner-with-z-regex/

In brief:

Because a single read with “/z” as the delimiter should read everything until “end of input”, it’s tempting to just do a single read and leave it at that, as the examples listed above all do.

In most cases that’s OK, but I’ve found at least one situation where reading to “end of input” doesn’t read the entire input – when the input is a SequenceInputStream, each of the constituent InputStreams appears to give a separate “end of input” of its own. As a result, if you do a single read with a delimiter of “/z” it returns the content of the first of the SequenceInputStream’s constituent streams, but doesn’t read into the rest of the constituent streams.

Beware of using it. It will be better to read it line-by-line, or use hasNext() checking until it will be real false.

UPD: In other words, try this code:

StringBuilder file1 = new StringBuilder();
Scanner scanner = new Scanner(new File(path1)).useDelimiter("\\Z");

while (scanner.hasNext()) {
   file1.append(scanner.next());
}

回答3:

I encountered this as well when using nextLine() on Mac, Java 7 update 45. Worse, after the line that is longer than 2048 bytes, the rest of the file is ignored and the Scanner thinks that it is already the end of file.

I change it to explicitly tell Scanner to use larger buffer, and it works.

Scanner sc = new Scanner(new BufferedInputStream(new FileInputStream(nf), 20*1024*1024), "utf-8");

java Scanner reads only first 2048 bytes

问题:

回答1:

回答2:

回答3:

收藏的人(0)

java Scanner reads only first 2048 bytes

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮