Levenshtein distance as a proxy for password strength is extremely limited, for the reasons that schroeder has outlined. And the user experience will probably be poor.
But the question is still academically interesting, and may be useful as a component for some use cases - so it still deserves a thorough answer. :D
Generally, the minimum Levenshtein distance between a previous password and a new one should be the same size as the minimum length of a randomly-generated password that would resist brute force (adjusted to the threat model). But it's important to note that even this minimum is often inadequate.
My answer is based on a few factors:
- the speed (attack difficulty) of the target hash type
- how much information about the previous password may be available to the attacker
- the methodology used to generate the initial password
- the usefulness of Levenshtein distance as a proxy for password change strength
First, about speed.
Worst case - a "fast" hash like MD5 - many different kinds of attacks become possible. This will allow us to put an upper bound on the minimum Levenshtein distance.
Best case - a "slow" hash like bcrypt - attack speed slows significantly, but is not as slow as it used to be (more about that below).
Second, about the user's previous password.
We must assume that the attacker has the user's previous password. Not only is password reuse chronic, but nowadays, even if the attack is against slow hash, modern attack stacks include a highly efficient "per target" attack mode called a "correlation attack" (made possible by John the Ripper's "single" mode and later by hashcat's -a 9 mode) that leverages knowledge of previous passwords, even in bulk across a large leak.
In correlation attacks, we assume that the attacker has access to the user's previous password (from a previous leak, from a different site due to password reuse, etc.). The attack takes the user's old password as a "base" and then applies a specific set of "mangling rules" (append $, capitalize the first letter, etc.) to each plaintext, but just for that user's hash. For larger batches of users and slower hashes, this dramatically increases the attack speed - because instead of trying the same candidate against every hash in a long list (as most traditional attacks do), correlation attacks apply those mangling rules just to that user's old password and then try the resulting candidate against just that user's hash.
This is not just academic - real-world cracking now heavily depends on correlating user:password or email:password leaks and cracks against new targets. So we must assume that the attacker has the previous password.
Third, about the password generation methodology.
Worst case - the passwords are single human-generated words that are easily found in a password frequency list (like "123456"). If a human selects the next password, the likelihood that it will be similar to that is probably high, so even very low Levenshtein distances are probably relevant.
Best case - the passwords are entirely randomly generated, both initially and later. In this case, Levenshtein distance isn't as relevant (or rather, measuring the Levenshtein distance between two randomly generated passwords is just a proxy for measuring the quality of their randomness).
Fourth, about Levenshtein distance as a proxy for password strength.
It is useful to think about this in terms of practical password entropy - how hard it is to attack in the real world, with full knowledge of how humans psychologically "chunk" password components. (This is also why Shannon entropy is notoriously bad at assessing the strength of non-randomly-generated passwords, because it assumes brute force - that each letter in a word is a "chunk" that has to be independently attacked - instead of how a human remembers password components ("just add my kid's name").
Worst case - at the lowest entropy levels, as discussed above - if a user just increments a number by 1, or appends a different special character, etc., the Levenshtein distance will be extremely low.
Best case - at the most complex - inserting a randomly selected character in a random position in the string is probably the highest practical entropy change (the hardest to predict) that can be made. For example, if we randomly inject 8 characters into a password, the delta in practical entropy is basically the same as if we had generated an 8-character random password.
Given all of the above ...
If we assume all worst cases (human-generated password, poor password-selection strategy, etc.), the answer should be clear: the amount of change in the password should itself be enough to withstand attack on its own - totally independent of the previous password (as if the first password was empty), and as if it was stored using a very fast hash like MD5. This is reduced to how fast we can reasonably expect a randomly generated password to be exhausted for a given threat model, and is covered well elsewhere (but can often hover around 13-15 random characters).
Unfortunately, as with many attempts to measure password strength, it's not that easy. This is because a password change with high Levenshtein distance can still have very low practical differential entropy.
For example, if the user's first password is 'password', and their new password is 'passwordrumpelstiltskin', the Levenshtein distance of 15 sounds terrific. But the practical Levenshtein distance - of real-world changes to the initial string as humans apply them - is not 15, but rather one ("add a word").
In other words ... just like Shannon entropy, usefulness of Levenshtein distance as a measure of password strength is really only useful in cases where the passwords are already randomly generated. And if you're doing that, you already have the information necessary to keep them strong, without trying to measure the difference between two of them.
In even other words ... very low Levenshtein distances are an OK proxy for password change weakness, but high Levenshtein distances are a very poor proxy for password change strength. So my answer tries to find a balance between the two.