Create a Javascript RegExp to find opening tags in

2020-04-16 19:01发布

问题:

I'm trying to write a Javascript HTML/php parser which would extract all opening tags from a HTML/php source and return the type of tag and attributes with their values while at the same time monitoring whether the values/attributes should be evaluated from static text or php variables. The problem is when I try to compose the Javascript RegExp pattern and more specifically certain rare cases. The RegExp I was able to come up with either involve negative lookbehind (to cope with the closing php tag - that is to match a closing bracket that is not preceded by a question mark) or fails in certain cases. The lookbehind version looks like:

<[a-zA-Z]+.*?(?<!\?)>

...and works perfect except for my case which must avoid using lookbehind. A more Javascript friendly version would be:

<[a-zA-Z]+((.(?!</)(?!<[a-zA-Z]+))*)?>

...which works except in this case:

<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>><?php echo $img; ?></option>

Am I approaching the problem completely messed up or is the lookbehind really necessary in my case? Any help is greatly appreciated.

回答1:

Just make sure the last letter before the '>' is not a ?, using [^?]. No lookaheads or -behinds needed.

<[a-zA-Z](.*?[^?])?>

the parentheses and the last ? is to also match tags like <b>.

EDIT The solution didn't work for single character tags without attributes. So here is one that does:

<[a-zA-Z]+(>|.*?[^?]>)


回答2:

much simpler answer would be <[^/^>]+>