What characters are allowed in an email address?

2018-12-31 04:59发布

I'm not asking about full email validation.

I just want to know what are allowed characters in user-name and server parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don't care. I'm asking about only this simple form: user-name@server (e.g. wild.wezyr@best-server-ever.com) and allowed characters in both parts.

17条回答
回忆,回不去的记忆
2楼-- · 2018-12-31 05:36

Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).

To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).

The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.

I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:

http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981

I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-Enlish users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.

So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses to Punycode.

After doing that you can follow (most of) the advice above.

查看更多
ら面具成の殇う
3楼-- · 2018-12-31 05:36

Gmail will only allow + sign as special character and in some cases (.) but any other special characters are not allowed at Gmail. RFC's says that you can use special characters but you should avoid sending mail to Gmail with special characters.

查看更多
伤终究还是伤i
4楼-- · 2018-12-31 05:37

In my PHP I use this check

<?php
if (preg_match(
'/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/',
"tim'qqq@gmail.com"        
)){
    echo "legit email";
} else {
    echo "NOT legit email";
}
?>

try it yourself http://phpfiddle.org/main/code/9av6-d10r

查看更多
只靠听说
5楼-- · 2018-12-31 05:39

You can start from wikipedia article:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
查看更多
与风俱净
6楼-- · 2018-12-31 05:43

The accepted answer refers to a Wikipedia article when discussing the valid local-part of an email address, but Wikipedia is not an authority on this.

IETF RFC 3696 is an authority on this matter, and should be consulted at section 3. Restrictions on email addresses on page 5:

Contemporary email addresses consist of a "local part" separated from a "domain part" (a fully-qualified domain name) by an at-sign ("@"). The syntax of the domain part corresponds to that in the previous section. The concerns identified in that section about filtering and lists of names apply to the domain names used in an email context as well. The domain name can also be replaced by an IP address in square brackets, but that form is strongly discouraged except for testing and troubleshooting purposes.

The local part may appear using the quoting conventions described below. The quoted forms are rarely used in practice, but are required for some legitimate purposes. Hence, they should not be rejected in filtering routines but, should instead be passed to the email system for evaluation by the destination host.

The exact rule is that any ASCII character, including control characters, may appear quoted, or in a quoted string. When quoting is needed, the backslash character is used to quote the following character. For example

  Abc\@def@example.com

is a valid form of an email address. Blank spaces may also appear, as in

  Fred\ Bloggs@example.com

The backslash character may also be used to quote itself, e.g.,

  Joe.\\Blow@example.com

In addition to quoting using the backslash character, conventional double-quote characters may be used to surround strings. For example

  "Abc@def"@example.com

  "Fred Bloggs"@example.com

are alternate forms of the first two examples above. These quoted forms are rarely recommended, and are uncommon in practice, but, as discussed above, must be supported by applications that are processing email addresses. In particular, the quoted forms often appear in the context of addresses associated with transitions from other systems and contexts; those transitional requirements do still arise and, since a system that accepts a user-provided email address cannot "know" whether that address is associated with a legacy system, the address forms must be accepted and passed into the email environment.

Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters

  ! # $ % & ' * + - / = ?  ^ _ ` . { | } ~

period (".") may also appear, but may not be used to start or end the local part, nor may two or more consecutive periods appear. Stated differently, any ASCII graphic (printing) character other than the at-sign ("@"), backslash, double quote, comma, or square brackets may appear without quoting. If any of that list of excluded characters are to appear, they must be quoted. Forms such as

  user+mailbox@example.com

  customer/department=shipping@example.com

  $A12345@example.com

  !def!xyz%abc@example.com

  _somename@example.com

are valid and are seen fairly regularly, but any of the characters listed above are permitted.

As others have done, I submit a regex that works for both PHP and JavaScript to validate email addresses:

/^[a-z0-9!'#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!'#$%&*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-zA-Z]{2,}$/i
查看更多
登录 后发表回答