PHP - Need to remove duplicate characters within a

2019-02-19 12:38发布

I have been searching all over the internet for a solution, but could not find one.

I need to remove duplicate characters within a String but would also like to include an exception to allow a integer amount of characters to repeat / remain in the string.

For example, I tried the following:

$str = 'This ----------is******** a bbbb 999-999-9999 ******** 8888888888 test 4444444444 ********##########Sammy!!!!!! ###### hello !!!!!!';

$t1 = preg_replace('/(.)\1{3,}/','',$str);
$t2 = preg_replace('/(\S)\1{3,}/','',$str);
$t3 = preg_replace('{(.)\1+}','$1',$str);
$t4 = preg_replace("/[;,:\s]+/",',',$str);
$t5 = preg_replace('/\W/', '', $str);
$t6 = preg_replace( "/[^a-z]/i", "", $str);

echo '$t1 = '.$t1.'<br>';
echo '$t2 = '.$t2.'<br>';
echo '$t3 = '.$t3.'<br>';
echo '$t4 = '.$t4.'<br>';
echo '$t5 = '.$t5.'<br>';
echo '$t6 = '.$t6.'<br>';

Results:

$t1 = This is a 999-999- test Sammy hello 
$t2 = This is a 999-999- test Sammy hello 
$t3 = This -is* a b 9-9-9 * 8 test 4 *#Samy! # helo !
$t4 = This,----------is********,a,bbbb,999-999-9999,********,8888888888,test,4444444444,********##########Sammy!!!!!!,######,hello,!!!!!!
$t5 = Thisisabbbb99999999998888888888test4444444444Sammyhello
$t6 = ThisisabbbbtestSammyhello

The desired out put would be:

This ---is*** a bbbb 999-999-9999 *** 8888888888 test 4444444444 ***###Sammy!!! ### hello !!!

As you can see, the desired output leaves the numbers alone and only leaves 3 repeated characters, i.e. --- ### * !!!

I would like to be able to change the exceptions from 3 to any other integer if possible.

Thanks in advance.

2条回答
We Are One
2楼-- · 2019-02-19 13:13

The regex you are looking for: /((.)\2{2})\2*/ If you need exception n, put n-1 in the curly brace {n-1}: /((.)\2{n-1})\2*/

EDIT: for non-number or what ever you what, replace . with other things, for example [^\d] etc. /(([^\d])\2{2})\2*/

查看更多
萌系小妹纸
3楼-- · 2019-02-19 13:24

This will do it:

preg_replace('/(([^\d])\2\2)\2+/', '$1', $str);

[^\d] matches a single character which isn't a digit.
\2 refers to the captured digit
$1 refers to the first captured group which will be the first three repeated characters, so the extra \2+ gets stripped off.

Codepad

查看更多
登录 后发表回答