Remove all control characters from a Java string

2020-06-09 03:46发布

I have a string coming from a UI that contains control characters such as line feeds and carrage returns.

I would like to do something like this:

String input = uiString.replaceAll(<regex for all control characters> , "")

Surely this has been done before!?

标签: java regex
4条回答
够拽才男人
2楼-- · 2020-06-09 04:28

Something like this should do the trick:

String newString = oldString.replaceAll("[\u0000-\u001f]", "");
查看更多
淡お忘
3楼-- · 2020-06-09 04:36

Using Guava, probably more efficient than using the full regex engine, and certainly more readable...

return CharMatcher.JAVA_ISO_CONTROL.removeFrom(string);

Alternately, just using regexes, albeit not quite as readably or efficiently...

return string.replaceAll("\\p{Cntrl}", "");
查看更多
聊天终结者
4楼-- · 2020-06-09 04:41

To remove just ASCII control characters, use the Cntrl character class

String newString = string.replaceAll("\\p{Cntrl}", "");

To remove all 65 of the characters that Unicode refers to as "control characters", use the Cntrl character class in UNICODE_CHARACTER_CLASS mode, with the (?U) flag:

String newString = string.replaceAll("(?U)\\p{Cntrl}", "");

To additionally remove unicode "format" characters - things like the control characters for making text go right-to-left, or the soft hyphen - also nuke the Cf character class:

String newString = string.replaceAll("(?U)\\p{Cntrl}|\\p{Gc=Cf}", "");
查看更多
Animai°情兽
5楼-- · 2020-06-09 04:41

The Guava CharMatcher.JAVA_ISO_CONTROL is deprecated, use javaIsoControl() instead:

CharMatcher.javaIsoControl().removeFrom(string);
查看更多
登录 后发表回答