I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:
String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;
I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.
You can use following regular expression construct
You may try this:-
[^\w]
matches a non-word character, so the above regular expression will match and remove all non-word characters.This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:
Spaces are initially left in the input so the split will still work.
By removing the rubbish characters before splitting, you avoid having to loop through the elements.
If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:
It loops through the underlying
char[]
in theString
and only appends thechar
if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of thechar
.I don't like to use regex, so here is another simple solution.
Note: This will include both Letters and Digits