I'm doing some HTML stripping using regular expressions (yes, I know, never parse HTML with regexes, but I'm just stripping it, and I also unfortunately cannot use any external libraries). I'm using a regex from the Regular Expressions Cookbook, and it has worked great, except I just ran into this problem:
In the string Bob Saget <bobs@aol.com>
, my regex is matching the email as a tag.
So my question is, is the @
sign a valid XML or HTML tag character? (I'm not asking whether or not it is valid within an attribute; I know that it is) If it is not, I will be able to successfully exclude it in my regex.
I'm not sure where to look this up. I looked here and I think that says that in XML, the at-sign is not allowed in a tag; however, I would appreciate some concrete proof.
After another look at the XML Specification:
A tag consists of:
A Name consists of:
A NameStartChar consists of:
A NameChar consists of:
The
@
sign isU+0040
So the
@
sign is not valid in a NameChar or a NameStartChar, and thus not valid in a Name.