3

I would like to verify that the input from the user complies to the format of email address (in Java application). On the following page there is verbose regex that should properly validate the email http://emailregex.com/ (RFC 5322 Official Standard).

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

If I validate regex through this online tool http://redos-checker.surge.sh/ it says that this regex is vulnerable to ReDoS.

enter image description here

Is this true. Could someone craft such pattern that it would halt the evaluation of this regex pattern? Or at least it would take more than 3s?

Is there some tool available that could craft input which could potentially break evaluation, based on the regex itself?

Glorfindel
  • 2,235
  • 6
  • 18
  • 30
  • 1
    I believe this is better suited on StackOverflow... – ThoriumBR Feb 24 '21 at 18:38
  • At the end I used different solution to verify the format of email input to prevent obvious user mistakes that is partially implementation of programming language and regex. It doesn't verify that the email is correct and is owned by the user. But I would still like to know with which exact input you can cause ReDoS attack for above provided regex pattern. How to prove that this pattern is vulnerable. – RenatoIvancic Mar 06 '21 at 15:50

1 Answers1

1

An mail is valid if it can be mailed

You should focus on sending an email to the user rather on a regex.

There are working email addresses that are technically invalid, and syntactically valid email addresses would still require to email the user to validate it.

When this was brought in the HTML standard, Aryeh Gregor did a check on Wikipedia users that had validated their accounts (i.e. the provided email worked), yet found that people did use all kinds of weird characters, so they basically stated that there would need to be an @.

They now expanded it a bit, I would recommend you to use their regex if you must, but a regex step should only serve to filter out basic mistakes. Don't try to make a full validation.

The best validation of email addresses is that the user is able to receive it there. A few checks that there is a domain, or that the domain exists can help detect errors early, but little more. I would recommend you to also check https://isemail.info/ for ideas. If someone signed up with a gmial.com email address, pretty sure it's a typo albeit that's a valid email and you can't really be sure they didn't actually provide it correctly.

If someone keen on morse code wanted to use the (technically invalid) ..._-_.-_-.-._-.-_._-..-_-.-._...._.-_-._--._.@invalid-email.com email address, why should you insist that they use the correct "..._-_.-_-.-._-.-_._-..-_-.-._...._.-_-._--._."@invalid-email.com instead?

Ángel
  • 17,578
  • 3
  • 25
  • 60
  • Thank you very much @Angel for the provided answer. Your explanation was insightful for the validation of email in general. Here I am more focused on how to break this regex pattern. Could someone craft such pattern that it would halt the evaluation of this regex pattern? – RenatoIvancic Mar 06 '21 at 15:52
  • @RenatoIvancic, strictly speaking they don't halt the evaluation, it's just so slow that it can take a really long time what should have been a quick evaluatin. It's explained on https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS I'm not sure what your problem is – Ángel Mar 07 '21 at 02:08
  • My problem is that I can not reproduce this behavior. What should be the input that you would experience regex execution going up to 5 seconds or more. Like I wrote in the original question. I want to see real example instead of theoretical explanation. – RenatoIvancic Mar 07 '21 at 13:07
  • It actually seems a bug in the ReDoS Checker. It marks a regex such as `^(0[ab]+)*$` as vulnerable, while I don't think it should (the static prefix should stop the problematic behavior). – Ángel Mar 21 '21 at 22:58