There are a number of email regexp questions popping up here, and I'm honestly baffled why people are using these insanely obtuse matching expressions rather than a very simple parser that splits the email up into the name and domain tokens, and then validates those against the valid characters allowed for name (there's no further check that can be done on this portion) and the valid characters for the domain (and I suppose you could add checking for all the world's TLDs, and then another level of second level domains for countries with such (ie, com.uk)).
The real problem is that the tlds and slds keep changing (contrary to popular belief), so you have to keep updating the regexp if you plan on doing all this high level checking whenever the root name servers send down a change.
Why not have a module that simply validates domains, which pulls from a database, or flat file, and optionally checks DNS for matching records?
I'm being serious here, why is everyone so keen on inventing the perfect regexp for this? It doesn't seem to be a suitable solution to the problem...
Convince me that it's not only possible to do in regexp (and satisfy everyone) but that it's a better solution than a custom parser/validator.
-Adam
The temptation of using RegExp, once you've mastered the basics, is very big. In fact, RegExp seems so powerful that people naturally want to start using it everywhere. I really suspect that there's a lot of psychology involved here, as demonstrated by Randall's XKCD comic (and yes, it is useful).
I've done an introductory presentation on RegExp once and the most important slide warned against its overuse. It was the only slide that used bold font. I believe this should be done more often.
On factor: the set of people who understand how to write a regular expression is very much larger than the set of people who understand the formal constraints on regular languages. Same goes for non-regular "regular expressions".
People use regexes for email addresses, HTML, XML, etc. because:
We're just looking for a fast way to see if the email address is valid so that we can warn the user they have made a mistake or prevent people from entering junk easily. Going off to the mail server and fingering it is slow and unreliable. The only real way to be sure is to get a confirmation email, but the problem is only to give a fast response to the user before the confirmation process takes place. That's why it's not so important to be strictly compliant. Anyway, it's a challenge and it's fun.
Regexps are much faster to use, of course, and they only validate what's specified in the RFC. Write a custom parser? What? It takes 10 seconds to use a regexp.
People write regular expressions because most developers like so solve a simple problem in the most "cool" en "efficient" way (which means that it should be as unreadable as possible).
In Java, there are libraries to check if a String represents an email address without you having to know anything about regular expressions. These libraries should be available for other languages aswel.
Like Jamie Zawinski said in 1997: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."