Create a Javascript RegExp to find opening tags in

2020-04-16 18:29发布

I'm trying to write a Javascript HTML/php parser which would extract all opening tags from a HTML/php source and return the type of tag and attributes with their values while at the same time monitoring whether the values/attributes should be evaluated from static text or php variables. The problem is when I try to compose the Javascript RegExp pattern and more specifically certain rare cases. The RegExp I was able to come up with either involve negative lookbehind (to cope with the closing php tag - that is to match a closing bracket that is not preceded by a question mark) or fails in certain cases. The lookbehind version looks like:

<[a-zA-Z]+.*?(?<!\?)>

...and works perfect except for my case which must avoid using lookbehind. A more Javascript friendly version would be:

<[a-zA-Z]+((.(?!</)(?!<[a-zA-Z]+))*)?>

...which works except in this case:

<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>><?php echo $img; ?></option>

Am I approaching the problem completely messed up or is the lookbehind really necessary in my case? Any help is greatly appreciated.

2条回答
仙女界的扛把子
2楼-- · 2020-04-16 19:04

much simpler answer would be <[^/^>]+>

查看更多
成全新的幸福
3楼-- · 2020-04-16 19:28

Just make sure the last letter before the '>' is not a ?, using [^?]. No lookaheads or -behinds needed.

<[a-zA-Z](.*?[^?])?>

the parentheses and the last ? is to also match tags like <b>.

EDIT The solution didn't work for single character tags without attributes. So here is one that does:

<[a-zA-Z]+(>|.*?[^?]>)
查看更多
登录 后发表回答