Regexp recognition of email address hard?

2019-01-01 01:07发布

I recently read somewhere that writing a regexp to match an email address, taking into account all the variations and possibilities of the standard is extremely hard and is significantly more complicated than what one would initially assume.

Can anyone provide some insight as to why that is?

Are there any known and proven regexps that actually do this fully?

What are some good alternatives to using regexps for matching email addresses?

19条回答
零度萤火
2楼-- · 2019-01-01 01:20

Many have tried, and many come close. You may want to read the wikipedia article, and some others.

Specifically, you'll want to remember that many websites and email servers have relaxed validation of email addresses, so essentially they don't implement the standard fully. It's good enough for email to work all the time though.

查看更多
唯独是你
3楼-- · 2019-01-01 01:21

Whether or not to accept bizarre, uncommon email address formats depends, in my opinion, on what one wants to do with them.

If you're writing a mail server, you have to be very exact and excruciatingly correct in what you accept. The "insane" regex quoted above is therefore appropriate.

For the rest of us, though, we're mainly just interested in ensuring that something a user types in a web form looks reasonable and doesn't have some sort of sql injection or buffer overflow in it.

Frankly, does anyone really care about letting someone enter a 200-character email address with comments, newlines, quotes, spaces, parentheses, or other gibberish when signing up for a mailing list, newsletter, or web site? The proper response to such clowns is "Come back later when you have an address that looks like username@domain.tld".

The validation I do consists of ensuring that there is exactly one '@'; that there are no spaces, nulls or newlines; that the part to the right of the '@' has at least one dot (but not two dots in a row); and that there are no quotes, parentheses, commas, colons, exclamations, semicolons, or backslashes, all of which are more likely to be attempts at hackery than parts of an actual email address.

Yes, this means I'm rejecting valid addresses with which someone might try to register on my web sites - perhaps I "incorrectly" reject as many as 0.001% of real-world addresses! I can live with that.

查看更多
柔情千种
4楼-- · 2019-01-01 01:21

Just to add a regex that is less crazy than the one listed by @mmaibaum:

^[a-zA-Z]([.]?([a-zA-Z0-9_-]+)*)?@([a-zA-Z0-9\-_]+\.)+[a-zA-Z]{2,4}$ 

It is not bulletproof, and certainly does not cover the entire email spec, but it does do a decent job of covering most basic requirements. Even better, it's somewhat comprehensible, and can be edited.

Cribbed from a discussion at HouseOfFusion.com, a world-class ColdFusion resource.

查看更多
情到深处是孤独
5楼-- · 2019-01-01 01:27

Can anyone provide some insight as to why that is?

Yes, it is an extremely complicated standard that allows lots of stuff that no one really uses today. :)

Are there any known and proven regexps that actually do this fully?

Here is one attempt to parse the whole standard fully...

http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

What are some good alternatives to using regexps for matching email addresses?

Using an existing framework for it in whatever language you are using I guess? Though those will probably use regexp internally. It is a complex string. Regexps are designed to parse complex strings, so that really is your best choice.

Edit: I should add that the regexp I linked to was just for fun. I do not endorse using a complex regexp like that - some people say that "if your regexp is more than one line, it is guaranteed to have a bug in it somewhere". I linked to it to illustrate how complex the standard is.

查看更多
余生无你
6楼-- · 2019-01-01 01:27

Adding to Waynes answer, there is also a section on www.regular-expressions.info dedicated to email, with a few samples.

You can always question whether it's worth it or if in fact any less-than-100%-covering regexp only contributes to a false sense of security.

In the end, actually sending the email is what will provide the real final validation. (-you'll find out if your mailserver has bugs;-)

查看更多
裙下三千臣
7楼-- · 2019-01-01 01:28

Validating e-mail addresses aren't really very helpful anyway. It will not catch common typos or made-up email addresses, since these tend to look syntactically like valid addresses.

If you want to be sure an address is valid, you have no choice but to send an confirmation mail.

If you just want to be sure that the user inputs something that looks like an email rather than just "asdf", then check for an @. More complex validation does not really provide any benefit.

(I know this doesn't answer your questions, but I think it's worth mentioning anyway)

查看更多
登录 后发表回答