Java doesn't see space in string [closed]

2019-01-20 18:06发布

So, I'm trying to parse some text file which has multiple lines of text. My job is to go through all words and print them out in file.

So, I read all lines, I'm looping through them and splitting every line by spaces, like this:

line.split("\\s+");

Now, the problem is that in some cases Java does not see space between two words...

I was also trying to loop through string which has space but Java doesn't see it, and Character.isSpaceChar(char) returned true...

And now I'm completly confused...

Here is code:

public void createMap(String inputPath, String outputPath)
            throws IOException {
                File f = new File(inputPath);
        FileWriter fw = new FileWriter(outputPath);
        List<String> lines = Files.readAllLines(f.toPath(),
                StandardCharsets.UTF_8);
        for (String l : lines) {
            for (String w : l.split("\\s+")) {
                if (isNotRubbish(w.trim())) {
                    fw.write(w.trim() + "\n");
                }
            }
        }
        fw.close();
    }
private boolean isNotRubbish(String w) {
        Pattern p = Pattern.compile("@?\\p{L}+",
                Pattern.UNICODE_CHARACTER_CLASS);
        Matcher m = p.matcher(w);
        return m.matches();
    }

1条回答
女痞
2楼-- · 2019-01-20 18:53

I suspect that you have in your text character which is similar to non-breakable-space which is not white space so it can't be matched via \\s.

In that case try to use \p{Zs} instead of \s.

As mentioned in http://www.regular-expressions.info/unicode.html

\p{Zs} will match any kind of space character

BTW if you would also like to include other separators than spaces like tabulators \t or line breaks \r \n you can combine \p{Zs} with \s like [\p{Zs}\s]

查看更多
登录 后发表回答