RegEx find all XML tags

2019-05-26 12:03发布

问题:

How do I match all the beginning tags in an XML document with RegEx? I just need to collect the tag names used.

This is what I have:

(?<=<)(.*?)((?= \/>)|(?=>))

this matches all the beginning and closing tags.

Example:

<Habazutty>yaddayadda</Habazutty>
<Vogons />
<Targ>blahblah</Targ>

Above code matches:

Habazutty
/Habazutty
Vogons
Targ
/Targ

I only need

Habazutty
Vogons
Targ

I couldn't figure out a way to exclude the closing tags. Negative lookahead didn't work - found nothing. I must have messed up.

回答1:

You could change (?<=<)(.*?)((?= \/>)|(?=>)) to (?<=<)([^\/]*?)((?= \/>)|(?=>)), i.e. instead of using (.*?) for the tag name, use ([^\/]*?). / is not allowed in tag names anyway.



回答2:

You can achieve this simply using:

<([^\/>]+)[/]*>

The group capture will have your output



回答3:

Found another solution:

((?=<)(?!<\/)<)(.*?)((?= \/>)|(?=>))

Basically this ((?=<)(?!<\/)<) looks behind everything that is "<" (?=<) and not "< /" (?!<\/).

@Redneb's answer is cleaner though, less capturing groups and shorter and fancier.