Java doesn't see space in string [closed]

2019-01-20 18:37发布

问题:

So, I'm trying to parse some text file which has multiple lines of text. My job is to go through all words and print them out in file.

So, I read all lines, I'm looping through them and splitting every line by spaces, like this:

line.split("\\s+");

Now, the problem is that in some cases Java does not see space between two words...

I was also trying to loop through string which has space but Java doesn't see it, and Character.isSpaceChar(char) returned true...

And now I'm completly confused...

Here is code:

public void createMap(String inputPath, String outputPath)
            throws IOException {
                File f = new File(inputPath);
        FileWriter fw = new FileWriter(outputPath);
        List<String> lines = Files.readAllLines(f.toPath(),
                StandardCharsets.UTF_8);
        for (String l : lines) {
            for (String w : l.split("\\s+")) {
                if (isNotRubbish(w.trim())) {
                    fw.write(w.trim() + "\n");
                }
            }
        }
        fw.close();
    }
private boolean isNotRubbish(String w) {
        Pattern p = Pattern.compile("@?\\p{L}+",
                Pattern.UNICODE_CHARACTER_CLASS);
        Matcher m = p.matcher(w);
        return m.matches();
    }

回答1:

I suspect that you have in your text character which is similar to non-breakable-space which is not white space so it can't be matched via \\s.

In that case try to use \p{Zs} instead of \s.

As mentioned in http://www.regular-expressions.info/unicode.html

\p{Zs} will match any kind of space character

BTW if you would also like to include other separators than spaces like tabulators \t or line breaks \r \n you can combine \p{Zs} with \s like [\p{Zs}\s]