Are those email addresses valid?
Yes, they are. See for example here or with a bit more explanation here.
For a nice explanation on how emails may look, see the informational RFC3696. The more technical RFCs are linked there as well.
Attacks possible in the local part of an Email Address
Without quotes, local-parts may consist of any combination of
alphabetic characters, digits, or any of the special characters
! # $ % & ' * + - / = ? ^ _ ` . { | } ~
period (".") may also appear, but may not be used to start or end
the local part, nor may two or more consecutive periods appear.
Stated differently, any ASCII graphic (printing) character other
than the at-sign ("@"), backslash, double quote, comma, or square
brackets may appear without quoting. If any of that list of
excluded characters are to appear, they must be quoted.
So the rule is more or less: most characters can be part of the local part, except for @\",[]
, those must be in-between "
(except of course "
itself, which has to be escaped when in a quoted string).
There are also rules on where and when to quote and how to handle comments, but that's less relevant to your question.
The point here is that many attacks can be part of the local part of an email address, for example:
'/**/OR/**/1=1/**/--/**/@a.a
"<script>alert(1)</script>"@example.com
" onmouseover=alert(1) foo="@example.com
"../../../../../test%00"@example.com
- ...
Attacks possible in the domain part of an Email Address
The exact structure of the domain part can be seen in RFC2822 or RFC5322:
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
dcontent = dtext / quoted-pair
dtext = NO-WS-CTL / ; Non white space controls
%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including "[",
; "]", or "\"
Where:
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 / ; characters not including
obs-dtext ; "[", "]", or "\"
You can see that again, most characters are allowed (even non-ascii characters). Possible attacks would be:
a@a.a&a=////etc/passwd
foo@bar(<script>alert(1)</script>).com
foo@'/**/OR/**/1=1/**/--/**/
Conclusion
You can't validate email addresses safely.
Instead, you need to make sure to have proper defenses in place (HTML encoding for XSS, prepared statements for SQL injection, etc).
As defense in depth, you could forbid quoted strings and comments to gain some amount of protection, as these two things allow the most unusual characters and string. But some attacks are still possible, and you will exclude a small amount of users.
If you do need additional input filtering that exceeds the limits of the email format, because you do not trust the rest of your application, you should carefully consider what you do allow and what you do not allow. For example +
is used by gmail to allow filtering incoming emails, so not allowing it may lead users to not sign up. Other characters may be used by other providers for similar functionalities. A first approach might be to only allow alphanum + ! # % * + - = ? ^ _ . | ~
. This would disallow < > ' " ` / $ { } &
, which are characters used in common attacks. Depending on your application, you may want to disallow further characters.
And as you mentioned RFC822: It is a bit outdated (it's from 1982), but even it allows for quoted strings and comments, so just saying that you only accept RFC822 compliant addresses would not only not be practical, but also not work.
Also, are you checking your emails client-side? The JS code gives that impression. An attacker could just bypass client-side checks.