Problem
Initial note: I'm biased towards web servers, yet many of what is said here will apply to other kinds of services.
The problem is Denial of Service. It can happen two ways: 1) Attackers runs brute-force in such a way that it ends up saturating the server, and now nobody can access the service. 2) Users (malicious or not) may try too many times, leading to the access being locked.
Other considerations
Telling the user if the error is in the user name or the password will allow an attacker to brute-force/dictionary user names. Although this goes against usability, you should opt in the side of security if the accounts are meant to private or anonymous by default※.
Telling the user that the account is locked will make it easier for attackers to cause problems by locking many accounts (which is a form of denial of service, and it will probably lead to many support tickets). Also, accounts that don’t exist can't be locked, allowing the attackers to discover accounts by this method. You may consider mocking lock on fake accounts.
Discovering accounts is not only half the brute-force/dictionary attack, but it can also be useful in future social engineering.
A variant of brute-force/dictionary attack is to try the same password (usually a statistically common one) against a large number of user names.
※: Will search engines be able to index user names? If they will, opt on usability (attackers have a list of valid user names in search engine cache anyway). Avoid allowing search engines to access such information in sites where knowing who has an account can be considered sensitive information.
Possible solutions
There are a few common things to try to solve the problem:
- Add a CAPTCHA
- Add a retry time
- Lock the account
- Lock the origin
- Two-factor authentication
It should be noted that only locking the IP at firewall level or web server configuration level will have a real impact in server load. Yet, if you are only locking the origin when paired with the given account, the logic will be in server side code. It is also true for the rest of sulutions that they require server side code.
These solutions that rely on server side code and so it will not really protect the server from a flood attack. This means that the main application of these methods is as deterrent.
Vocabulary:
For the context of this post, these words have the meaning mentioned here:
"Lock": "prevent access until further authentication is provided", to provide further authentication means to follow similar - if not the same - steps as those provided to users who forgot their password.
"Origin": the IP, user agent, or other techniques the server may use to identify the source of a connection. If used, it should be mentioned in the privacy policy that the server will log such information.
"Third channel": Email, SMS, dedicated app, or other medium of private communication outside of the control of the server.
It should also be noted that under this definition the retry time is not a lock because it doesn't require additional authentication from the user but waiting instead.
And, because it can't be said enough times, hash and salt your passwords.
CAPTCHA
It should be noted that not all CAPTCHA solutions are visual. Some are auditory, and even others are textual (for example: "How many colors in the list purple, penguin, blue, white and red?").
Pros
CAPTCHA is easy to implement using third party solutions. Using third party solution also externalizes the problem to make a CAPTCHA strong enough.
Cons
Using a CAPTCHA may become an inconvenience for legitimate users that may be having problems typing the password. Current reCAPTCHA mitigates this problem by using behavior analytics to identify human users.
A robot may solve CAPTCHA by clever AI, or simply by asking the attacker to solve it.
Retry time
Pros
Retry time have an advantage in that it buys time. So, it can be combined with a notification on a third channel to alert the owner of the account.
What action the user may take? You can suggest using a stronger password, but that won't really solve the problem.
As an alternative, consider giving the user the option to deny access from the attacker machine (that is to lock the combination of origin and account)※. See "Lock the origin".
※: It should require authentication, and only affect the current account. Care should be put in avoiding any defect that may lead to an account locking another account.
Cons
Using a retry time reduces the usability of the service, as it becomes an inconvenience for legitimate users that may be having problems typing the password. This is worst than CAPTCHA as it is cognitive downtime.
Brute-force/dictionary attacks are still viable if the attacker performs an attempt once each hour or so. Alternatives to deal with this problem include security policies to change the password frequently (which the user may render ineffective by choosing similar passwords) and IDS or other analytics to detect attackers (which could be circumvented by distributing the attack from multiple sources - hopefully that is expensive enough to be a deterrent itself).
Lock the account
Pros
It is resilient against spreading an attack over time or multiple origins.
Cons
Locking the account may lead to a legitimate user being locked out of the account because of the amount of failed attempts.
Also, failed attempts by an attacker in a third location will lock out the legitimate user. Combining origin lock with account lock would allow more granular control. In this case, the account would be locked only for the origin from where access is being attempted.
Attacks may still affect the system by causing locked out legitimate users to contact support or to find an alternative service.
Lock the origin
Pros
Locking an origin, independently of the account has the advantage of allowing to stop attackers instead of punishing accounts.
Cons
In would require the server to track the origin of requests and distinguish failed from successful attempt.
The origin of an attack may be shared between many users (For example in Internet cafés), and locking an origin may mean to lock out legitimate users.
Combining origin lock with account lock would allow more granular control. In this case, at first the origin would be locked only for the account it is trying to access, yet an origin that is locked for many accounts can be locked globally.
Two-factor authentication
All variants of two-factor authentication are strong brute-force/dictionary deterrents. There are two main variants:
Send a code via a third channel to allow authentication. It shouldn't require additional measures to prevent brute-force of that code, because it is meant to be single use and short lived.
Require a code from dedicated hardware/software key for authentication. The key must provide a single use code that authorizes the authentication.
Pros
Two-factor authentication is the only solution that can actually make brute-force/dictionary attack ineffective. That is accomplished by requiring a single use code, which being single use won't be guessed by attempting multiple times.
Cons
Two-factor authentication is often more expensive to implement.
What to use?
It makes sense to add additional protection to deter brute-force/dictionary attacks. The need for these measures is increased in systems where the password space is too small※, or if the minimal strength of the passwords is too low (for example the four digits pins common in banking).
※: It is good to put an upper cap to the size of the password. This way the server will not be chocked while making an expensive hash on the password. And you should use an expensive hash because it will deter bure force attacks against stolen hash codes.
CAPTCHA should be the first option, as it is very easy and cheap to implement (using stablished solutions such as reCAPTCHA).
Between retry time and locks, consider that the minimal viable implementation is similar: to lock an account you add a field to the account object/record marking it as locked, and then check that on authentication... to put a retry time, you do the same thing, except what you store is the time at which authentication is valid again.
It makes sense to mitigate the inconvenience by adding these measures once a few attempts have failed. If so, apply CAPTCHA first as it doesn’t create cognitive downtime for the user.
Between the lock options, we have seen that combining origin and account is a better alternative (but also more complex) than either one alone. The implementation will require logs and analytics.
Finally, two-factor authentications have benefits that surpass the above solutions. Yet it is the most expensive to implement as it requires connection to a third party service (email server, SMS service, dedicated app, dedicated hardware, etc.).
I would suggest to implement logging and analytics and based on them decide if you want to implement locking or if you want to implement two-factor authentication.
How many attempts?
There will be:
- n1 attempts until catpcha appears.
- n2 attempts until retry time appears.
- n3 attempts until lock is applied.
Note: if you use two-factor authentication, you use it from the first attempt.
The values for this variables can be tweaked in the future based on your analytics. Yet, for reasonable defaults, consider:
n1 should be an estimate of the number of attempts a person may do if they have problems typing the password. 2 attemps would be the minimun n1 because that accounts for the basic caps error. Note: gmail allows me 20 attempts before using CAPTCHA.
n2 should be an estimate of the number of attempts a person would do before going to access recovery mechanism. There is no hard minimun, in fact it can be applied as soon as you apply CAPTCHA and have increasing time intervals to wait. In my opinion n2 = 3 * n1 is good starting point.
n3 should be an estimate of the number of attempts at which it is more probable an attack is being made. Consider that CATPCHA and retry time should deter any manual attack, so n3 need not to be much higher. In my opinion, n3 = 2 * n2 is a good starting point.
Note about retry time: The interval the user must wait can be increased on each attempt. This allows you use a very small initial interval (for example 1 second) and build up from there until a hard cap (for example 1 day).
Note about counting attempts: You should avoid an overflow in the attempts count. If you are storing the number of attempts in the account object/record, handle the overflow. If you are doing a query on logs to get the number of failed attempts from the last successful one, consider adding a time interval (that will also cap the query time).