I'm looking for a regex, which removes close tags, and everything, until it finds an open tag. For example:
</xy>..</zz>..<a>...
-> <a>...
</b>..</cc>..<a href="#">...</a>
-> <a href="#">...</a>
I tried this, but doesn't work for some reason:
$html = preg_replace("/^.*<.*>/","<.*>",$html);
If I understand correctly your responses to Avinash Raj's answer you need something which matches any number of lines of input upto the first open tag, but that only matches once so all subsequent content is maintained.
The first part
Matches any number of lines but not greedily (hence the ?s), so it will stop at the first line which contains an open tag:
This is then followed once again by any number of lines of anything:
So to extract what you want you would replace
With
Which is everything from and including the first open tag.
Below regex would capture and stores all the text before an opening tag into a group(
group1
) and also it would capture and stores the remaining strings into another group. So the second group contains the text from the opening tag.DEMO
Your php code would be,
OR
Explanation:
(.*)(<\w.*)
capture from the begining of the string and stops capturing when it finds a<
folllowed by an\w
word character. Strings before<\w
are stored inside group 1 and the strings after<\w
are stored inside group2(Including<\w
).