So, I'm trying to parse some text file which has multiple lines of text. My job is to go through all words and print them out in file.
So, I read all lines, I'm looping through them and splitting every line by spaces, like this:
line.split("\\s+");
Now, the problem is that in some cases Java does not see space between two words...
I was also trying to loop through string which has space but Java doesn't see it, and Character.isSpaceChar(char)
returned true...
And now I'm completly confused...
Here is code:
public void createMap(String inputPath, String outputPath)
throws IOException {
File f = new File(inputPath);
FileWriter fw = new FileWriter(outputPath);
List<String> lines = Files.readAllLines(f.toPath(),
StandardCharsets.UTF_8);
for (String l : lines) {
for (String w : l.split("\\s+")) {
if (isNotRubbish(w.trim())) {
fw.write(w.trim() + "\n");
}
}
}
fw.close();
}
private boolean isNotRubbish(String w) {
Pattern p = Pattern.compile("@?\\p{L}+",
Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = p.matcher(w);
return m.matches();
}