Remove all special characters from a string not in

2019-05-07 17:28发布

问题:

I want to remove all the special characters from a string except numbers and normal a-z characters.

I am doing it like this:

text = text.replaceAll("[^a-zA-Z0-9 ]+", "");

The problem with this way is that it will also remove all non-latin characters like è, é, ê, ë and many others.

By non-special characters (the ones I want to keep) I mean all the numbers and all the alphabetical characters for all the languages or at least as many as possible.

How do I only remove the special characters?

回答1:

You can try \p{L} for all letters and \p{N} for all numbers:

text = text.replaceAll("[^\\p{L}\\p{N} ]+", "");


回答2:

I know you said regex, but if guava is an option:

CharMatcher.JAVA_LETTER_OR_DIGIT.retainFrom("èêAAAGRt123")