Remove unnecessary close tags using regex

I'm looking for a regex, which removes close tags, and everything, until it finds an open tag. For example:

</xy>..</zz>..<a>... -> <a>...

</b>..</cc>..<a href="#">...</a> -> <a href="#">...</a>

I tried this, but doesn't work for some reason:

$html = preg_replace("/^.*<.*>/","<.*>",$html);

标签： php regex preg-replace

2条回答

聊天终结者

2楼-- · 2020-04-30 18:49

If I understand correctly your responses to Avinash Raj's answer you need something which matches any number of lines of input upto the first open tag, but that only matches once so all subsequent content is maintained.

.*(\n.*?)*?(<\w.*(\n.*)*)

The first part

.*(\n.*?)*?

Matches any number of lines but not greedily (hence the ?s), so it will stop at the first line which contains an open tag:

<\w

This is then followed once again by any number of lines of anything:

.*(\n.*)*

So to extract what you want you would replace

.*(\n.*?)*?(<\w.*(\n.*)*)

With

\2

Which is everything from and including the first open tag.

0人赞添加讨论(0) 举报

对你真心纯属浪费

3楼-- · 2020-04-30 18:53

Below regex would capture and stores all the text before an opening tag into a group(group1) and also it would capture and stores the remaining strings into another group. So the second group contains the text from the opening tag.

(.*)(<\w.*)

DEMO

Your php code would be,

<?php
$re = '~(.*)(<\w.*)~'; 
$str= '</b>..</cc>..<a href="#">...</a> -> <a href="#">...</a>';
$replacement = "$2";
echo preg_replace($re, $replacement, $str);
?> //=>  <a href="#">...</a>

<?php
$re = '~(?:.*)(<\w.*)~'; 
$str= '</p>\n<p>Â </p>';
$replacement = "$1";
echo preg_replace($re, $replacement, $str);
?>

Explanation:

(.*)(<\w.*) capture from the begining of the string and stops capturing when it finds a < folllowed by an \w word character. Strings before <\w are stored inside group 1 and the strings after <\w are stored inside group2(Including <\w).

0人赞添加讨论(0) 举报

Remove unnecessary close tags using regex

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间