PHP replace characters except the HTML tags

2019-06-05 20:56发布

问题:

I need to replace the characters 0,1,2,...,9 with \xD9\xA0,\xD9\xA1,\xD9\xA2,...,\xD9\xA9 in a string. This string comes from the CKEditor so it may contains html tags. Using the following code

$body = str_replace("1", "\xD9\xA1", $body);

it replaces every 1 with \xD9\xA1 so it effects the tag <h1> and also <table border="1"> while I only need to replace the numbers in the body not the tags.

The tags that contain numbers are <h0><h1><h2><h3><h4><h5><h6> and cellspacing and cellpadding and border of table tag.

How do I can replace the numbers with the above symbols while it won't effect the <h0><h1><h2><h3><h4><h5><h6>and cellspacing and cellpadding and border ?

回答1:

You shouldn't use regex to process html, however if you still want to use a regex you could use the discard pattern with a regex like this:

<.*?>(*SKIP)(*FAIL)|1

Working demo

The idea behind this regex is to skip whatever it is within <...> but match the rest. So, it will only match the number 1 that are not within html tags. Once again, I'd use a html parser instead.

Php code

$re = "/<.*?>(*SKIP)(*FAIL)|1/"; 
$str = "<h0><h1><h2><h3>\n<table border=\"1\">\n1\n"; 
$subst = "\xD9\xA1"; 

$result = preg_replace($re, $subst, $str);