Tricky one! I can't seem to trigger the exponential complexity this (\.[CHARS]+)*
pattern should introduce. I can make it be a little slow with a huge string (a few megabytes), but I guess that is to be expected with any regular expression. In Python:
import time, re
t=time.time()
print(re.match(r'^[\w!#$%&\'*+/=?`{|}~^-]+(\.[\w!#$%&\'*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}$', 'a.'+'xb'*2500000+'test@example.com'))
print(time.time()-t)
t=time.time()
print(re.match(r'^[\w!#$%&\'*+/=?`{|}~^-]+(\.[\w!#$%&\'*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}$', 'ax'+'.b'*2500000+'test@example.com'))
print(time.time()-t)
In the first test, we supply a long string where the xb
pattern is repeated. This matches just the first part (namely, [CHARS]+
) and not the second (namely, (\.[CHARS]+)*)
). The second test does the opposite: it repeats the '.b' pattern which matches the second part a lot of times (where the backtracking should be) and not the first.
As output, you can see the difference in time it takes:
0.06872129440307617
0.3483853340148926
Both strings are equally long, so the pattern does take some more time.
Usually, one would use the second pattern and then make the end not match, so remove the .com
part or something. The regex engine would start to backtrack and take exponentially longer with each iteration. I'm not sure why that does not happen here.
So far as I can see, you might be safe by just limiting the input length to a few kilobytes. But I am probably just too stupid to trigger it; using a pattern that is known to be vulnerable is just asking for someone to come along and figure it out ;-)