RFC 6238 recommends the server to implement some form of resynchronization algorithm to account for time drift of the device used to generate the OTP. However, the RFC provides very little information on how to actually implement such a synchronization and the consequences it might have on the overall security of the algorithm. This time drift typically is a problem with devices that cannot sync with NTP, such as most programmable TOTP hard tokens. I would like to share our proposal here and hope to come up with an algorithm that is secure, adequately accounts for time drift in devides and requires minimal effort from the end user. I've also added some potential issues I see with the proposal.
Synchronization algorithm
We work with a 30 second window, which is typical for TOTP, but the principal should be the same with other window sizes. In this proposal, n is the current validation window, taking into account the correction for drift as recorded for that device. n-1 is the previous window, n+1 is the next, etc. The delay window (the time between generation of the OTP and its verification) is set at 1 step, as recommended by RFC 6238. This accounts for the difference between searching forward and backward.
- The OTPs for windows n-2, n-1, n and n+1 are accepted as valid.
- At n-2 and n+1 the drift for the device is automatically adjusted with -1 and +1 time step respectively.
- For the other windows between n-6 and n+5, the drift is detected but not automatically adjusted. A manual drift correction is started (described below). The drift detection window of 11 validation windows should be sufficient to allow the user to resynchronize the TOTP when not logging in for a long time.
- The OTP with a drift correction of 0 is also checked. If this OTP matches, the time drift itself is corrected and a manual drift correction is started to offset 0 (the current server time).
- All other OTPs are rejected.
The manual drift correction is a process where the user is asked to enter 2 consecutive OTPs. These OTPs are used to calculate the drift within an offset of maximum 1 hour from the system time. Not being able to give 2 consecutive OTPs is considered a failed login attempt.
Possible issues
The RFC recommends against widening the window beyond two time steps (n and n-1). This proposal adds 2 additional time steps (n-2 and n+1) for the automatic adjustment of the offset. I see no way to perform such automatic adjustment without widening the window to at least 4 steps. Of course the automatic adjustment could be left out entirely, but the manual resynchronization is not very pleasant for the end user. Is the widening of the window small enough to maintain a good balance between security and usability?
An OTP will be checked against a total of 13 windows. This could potentially lead to leaking information about the secret. In my opinion this is not a problem, because the recommended length of the secret in combination with the hashing will make it (virtually) impossible to guess the secret, especially when the attempts are rate limited.
Is using 2 consecutive OTPs enough to reliably determine the offset in a window of 240 time steps (1 hour back and forward)? And is a maximum offset of 1 hour good enough to account for time drift of the devices used?