I'm parsing some messy HTML code with PHP in which there are some redundant
tags and I would like to clean them up a bit. For instance:
<br>
<br /><br />
<br>
How would I replace something like that with this using preg_replace()?:
<br /><br />
Newlines, spaces, and the differences between <br>
, <br/>
, and <br />
would all have to be accounted for.
Edit: Basically I'd like to replace every instance of three or more successive breaks with just two.
Here is something you can use. The first line finds whenever there is 2 or more <br>
tags (with whitespace between and different types) and replace them with wellformated <br /><br />
.
I also included the second line to clean up the rest of the <br>
tags if you want that too.
function clean($txt)
{
$txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
$txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
return $txt;
}
This should work, using minimum specifier:
preg_replace('/(<br[\s]?[\/]?>[\s]*){3,}/', '<br /><br />', $multibreaks);
Should match appalling <br><br /><br/><br>
constructions too.
this will replace all breaks ... even if they're in uppercase:
preg_replace('/<br[^>]*>/i', '', $string);
Try with:
preg_replace('/<br\s*\/?>/', '', $inputString);
Use str_replace, its much better for simple replacement, and you can also pass an array instead of a single search value.
$newcode = str_replace("<br>", "", $messycode);