UTF 8 String remove all invisible characters excep

2019-04-07 11:14发布

问题:

I'm using the following regex to remove all invisible characters from an UTF-8 string:

$string = preg_replace('/\p{C}+/u', '', $string);

This works fine, but how do I alter it so that it removes all invisible characters EXCEPT newlines? I tried some stuff using [^\n] etc. but it doesn't work.

Thanks for helping out!

Edit: newline character is '\n'

回答1:

Use a "double negation":

$string = preg_replace('/[^\P{C}\n]+/u', '', $string);

Explanation:

  • \P{C} is the same as [^\p{C}].
  • Therefore [^\P{C}] is the same as \p{C}.
  • Since we now have a negated character class, we can substract other characters like \n from it.


回答2:

My using a negative assertion you can a character class except what the assertion matches, so:

$res = preg_replace('/(?!\n)\p{C}/', '', $input);

(PHP's dialect of regular expressions doesn't support character class subtraction which would, otherwise, be another approach: [\p{C}-[\n]].)



回答3:

Before you do it, replace newlines (I suppose you are using something like \n) with a random string like ++++++++ (any string that will not be removed by your regular expression and does not naturally occur in your string in the first place), then run your preg_replace, then replace ++++++++ with \n again.

$string=str_replace('\n','++++++++',$string); //Replace \n
$string=preg_replace('/\p{C}+/u', '', $string); //Use your regexp
$string=str_replace('++++++++','\n',$string); //Insert \n again

That should do. If you are using <br/> instead of \n simply use nl2br to preserve line breaks and replace <br/> instead of \n