10
Write a function or program to validate an e-mail address against RFC 5321 (some grammar rules found in 5322) with the relaxation that you can ignore comments and folding whitespace (CFWS
) and generalised address literals. This gives the grammar
Mailbox = Local-part "@" ( Domain / address-literal )
Local-part = Dot-string / Quoted-string
Dot-string = Atom *("." Atom)
Atom = 1*atext
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
Quoted-string = DQUOTE *QcontentSMTP DQUOTE
QcontentSMTP = qtextSMTP / quoted-pairSMTP
qtextSMTP = %d32-33 / %d35-91 / %d93-126
quoted-pairSMTP = %d92 %d32-126
Domain = sub-domain *("." sub-domain)
sub-domain = Let-dig [Ldh-str]
Let-dig = ALPHA / DIGIT
Ldh-str = *( ALPHA / DIGIT / "-" ) Let-dig
address-literal = "[" ( IPv4-address-literal / IPv6-address-literal ) "]"
IPv4-address-literal = Snum 3("." Snum)
IPv6-address-literal = "IPv6:" IPv6-addr
Snum = 1*3DIGIT
; representing a decimal integer value in the range 0 through 255
Note: I've skipped the definition of IPv6-addr
because this particular RFC gets it wrong and disallows e.g. ::1
. The correct spec is in RFC 2373.
Restrictions
You may not use any existing e-mail validation library calls. However, you may use existing network libraries to check IP addresses.
If you write a function/method/operator/equivalent it should take a string and return a boolean or truthy/falsy value, as appropriate for your language. If you write a program it should take a single line from stdin and indicate valid or invalid via the exit code.
Test cases
The following test cases are listed in blocks for compactness. The first block are cases which should pass:
email@domain.com
e@domain.com
firstname.lastname@domain.com
email@subdomain.domain.com
firstname+lastname@domain.com
email@123.123.123.123
email@[123.123.123.123]
"email"@domain.com
1234567890@domain.com
email@domain-one.com
_______@domain.com
email@domain.name
email@domain.co.jp
firstname-lastname@domain.com
""@domain.com
"e"@domain.com
"\@"@domain.com
email@domain
"Abc\@def"@example.com
"Fred Bloggs"@example.com
"Joe\\Blow"@example.com
"Abc@def"@example.com
customer/department=shipping@example.com
$A12345@example.com
!def!xyz%abc@example.com
_somename@example.com
_somename@[IPv6:::1]
fred+bloggs@abc.museum
email@d.com
?????@domain.com
The following test cases should not pass:
plainaddress
#@%^%#$@#$@#.com
@domain.com
Joe Smith <email@domain.com>
email.domain.com
email@domain@domain.com
.email@domain.com
email.@domain.com
email.email.@domain.com
email..email@domain.com
email@domain.com (Joe Smith)
email@-domain.com
email@domain..com
email@[IPv6:127.0.0.1]
email@[127.0.0]
email@[.127.0.0.1]
email@[127.0.0.1.]
email@IPv6:::1]
_somename@domain.com]
email@[256.123.123.123]
since
IPv6-addr
has been left undefined, and there are test cases that have ipv6 addresses, is there a correct way to validate them? – ardnew – 2013-02-23T04:28:54.883Why should
email@d.com
and?????@domain.com
fail? – grc – 2013-02-23T08:04:58.8071@ardnew, I've added a link to the relevant RFC. I don't want to inline it because the question is already quite long. – Peter Taylor – 2013-02-23T09:16:01.967
@grc, good question. I've checked them, because no-one raised this during the several months that the question was in the sandbox, but I can't see why they should fail so I've moved them to the "Pass" side.
– Peter Taylor – 2013-02-23T09:19:17.860Are length limits required as well? 254 for entire email address/64 for local-part/63 for each domain label? – MichaelRushton – 2013-03-02T22:30:45.170