I'm trying to write a Javascript HTML/php parser which would extract all opening tags from a HTML/php source and return the type of tag and attributes with their values while at the same time monitoring whether the values/attributes should be evaluated from static text or php variables. The problem is when I try to compose the Javascript RegExp pattern and more specifically certain rare cases. The RegExp I was able to come up with either involve negative lookbehind (to cope with the closing php tag - that is to match a closing bracket that is not preceded by a question mark) or fails in certain cases. The lookbehind version looks like:
<[a-zA-Z]+.*?(?<!\?)>
...and works perfect except for my case which must avoid using lookbehind. A more Javascript friendly version would be:
<[a-zA-Z]+((.(?!</)(?!<[a-zA-Z]+))*)?>
...which works except in this case:
<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>><?php echo $img; ?></option>
Am I approaching the problem completely messed up or is the lookbehind really necessary in my case? Any help is greatly appreciated.
much simpler answer would be <[^/^>]+>
Just make sure the last letter before the '>' is not a ?, using [^?]. No lookaheads or -behinds needed.
the parentheses and the last ? is to also match tags like
<b>
.EDIT The solution didn't work for single character tags without attributes. So here is one that does: