Assume a 2FA system with user-supplied passwords and 6 digit TOTP tokens. It is not possible to test a TOTP token without authenticating with a password first, so whoever submits a token is presumed to know the password. Each token is generated for 30 seconds and is also valid within the preceding 30 seconds and the next window of 30 seconds alongside the "then current" token.
It's quite common to see false alarms because the token took a little too long to enter, so tokens can't be much longer to be practical.
Should a certain number of wrong token inputs in a row result in forced password invalidation, based on the assumption that the password may have leaked? (It might not have leaked. There might be just some problem with the second factor that will eventually go away.) If current widely used 2FA systems do this, what are the typical thresholds?
I am aware of RFC 4226 suggesting a throttling parameter of 5. I am not sure what exactly happens after the threshold is reached. Account lockout? Password invalidated? Wait a little and allow retrying?