Is there a Java implementation of the HTML5 input

2019-04-06 11:53发布

问题:

I'd like to use the new <input type="email" /> element. I'd like to have Java code that implements the same validation on the server that happens in the browser.

The HTML5 spec defines email addresses in ABNF as:

1*( atext / "." ) "@" ldh-str *( "." ldh-str )

where:

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

and:

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                       "!" / "#" /        ;  characters not including
                       "$" / "%" /        ;  specials.  Used for atoms.
                       "&" / "'" /
                       "*" / "+" /
                       "-" / "/" /
                       "=" / "?" /
                       "^" / "_" /
                       "`" / "{" /
                       "|" / "}" /
                       "~"

These are not the same rules as in RFC 5322. How can I test that an address complies with these rules in Java?

Thanks!

回答1:

You can use a regex:

[A-Za-z0-9!#$%&'*+-/=?^_`{|}~]+@[A-Za-z0-9-]+(.[A-Za-z0-9-]+)*



回答2:

Actually, The W3C Recommendation you've cited offers a regex as the equivalent for what they present as the ABNF which defines a valid email address:

/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

But this regex matches invalid email addresses, such as ".any..address.@123" (tested with https://regex101.com/).

This regex accepts (all invalid in an email address, according to Wikipedia):

  • "." (dot) at the beginning of local part
  • "." (dot) at the end of local part
  • multiple sequential "." (dot) in the local part
  • only numbers in domain part

and rejects (valid according to Wikipedia):

  • Unicode characters
  • some special characters delimited with quotation marks (")

Notice that W3C states that the specification they present is a willful violation of RFC 5322, so they have an "excuse" to leave off the valid cases, but IMHO it's not a reason to accept invalid addresses.

If you won't bother with those exception cases, you can use the regex that W3C suggests. Otherwise, you should work the regex to cover the cases you want to handle.