5

I noticed there are tons of questions and answers about password entropy on this forum, some even suggesting formulas for calculating it. None did answer my exact question.

What are possible or commonly used methods for calculating password entropy?

For example KeePass is using a method to calculate some kind of password entropy. How do they calculate it?

Are things like repeating patterns or predictable/easy combinations somehow included in those methods?

Bob Ortiz
  • 6,234
  • 8
  • 43
  • 90

4 Answers4

6

The proper way to calculate password entropy is to look at the password generation method, evaluate how much entropy is involved in the password generation method, and then evaluate how much of that input entropy is preserved by the encoding method. As an example, throwing a fair 6-sided dice once generates approximately 2.5-bits of entropy (note that it's an open question whether or not real-life dice is truly fair).

Password strength meters are not entropy calculator, they estimate the maximum amount of possible entropy contained in a string by doing statistical analysis on the password, based on commonly used password generation and cracking methods. In all cases, entropy estimator can be off by quite a large amount. While entropy estimator can be good to detect obviously weak password, it is not a good way to determine whether or not a password is good.

A random looking string can contain little entropy, for instance the SHA1 of one ASCII letter only contain 7-bits entropy; but most password meters would think it is a solid password.

Lie Ryan
  • 31,089
  • 6
  • 68
  • 93
4

Data entropy depends on the observer - there is no absolute measurement of entropy. It's even questionable as to whether or not anything in the universe it at all random, and "randomness" (or, more precisely, related to entropy, unpredictability) is the source or entropy.

Unpredictability being the operative term: hard for somebody to predict.

If you use the Mersenne Twister, for example, knowing the seed of the random sequence perfectly predicts the entire sequence - so your "random" password consists of 64 bits of entropy (if you use the 64-bit version, that is).

If you use dice ware, then the entropy stems from the number of times you rolled the dice and that's it.

Unfortunately, by the time it becomes a "password", the source of entropy is obscured.

For example: a three-value safe code where each value is in the range [0,99] has 3*log2(100) bits of entropy. Until you learn that they selected a 6-character word and used a phone keypad to turn it into numbers, and now the entropy is log2(numberOfSixLetterWords).

In short, the assumptions used to make a password are so fundamental to its entropy and so obscured by the immediate appearance of the password that you really cannot estimate it; you can only ever get an upper-bound of the password's entropy.

iAdjunct
  • 1,710
  • 10
  • 15
2

KeePass describes some of what they consider here, and it is described with some more detail on page 18 of this excellent paper:

Carnavalet, Xavier De Carné De, and Mohammad Mannan. "A large-scale evaluation of high-impact password strength meters." ACM Transactions on Information and System Security (TISSEC) 18.1 (2015): 1.

It would be too long to paraphrase, but yes, repeating patterns and predictable/easy combinations are included in the combination of most serious password strength calculators, including KeePass'. They rely on something they call a static entropy encoder which could be something as simple as a Huffman code used to compress data.

Jedi
  • 3,906
  • 2
  • 24
  • 42
2

The entropy of a password is a quantitative statement about the probability distribution of all the possible passwords. To simplify this, think of a probability distribution as a rule that, given a password, outputs the probability that that password is the one that was chosen.

So you really can't put a number to the entropy of a password unless you start from some model that tells you the relative likelihood of any two possible passwords. And the problem with layperson accounts about password entropy—which no doubt you've encountered—is that they routinely fail to clearly spell out what assumptions they're making about the distribution.

There's another set of complications here that relates to iAdjunct's point that "data entropy depends on the observer": we can make a distinction between the "true" distribution of passwords (which follows from how people actually choose passwords) and the "assumed" distribution (the distribution that the attacker believes they follow). A lot of (bad) password advice out there is based on the idea of using "unusual" password generation rules so that your password is an outlier in the real distribution or the ones that attackers choose.

But the easiest solution is to sidestep all this by choosing passwords according to a set of rules that gives enough entropy even if the attacker knows the rules (a version of Kerckhoffs's principle). Many of the (highly recommended!) responses to the following two questions stress this point:

So if you follow that advice, then the entropy of your passwords can be calculated straightforwardly by spelling out a randomized password generation scheme and assuming that the attacker will discover it.

Luis Casillas
  • 10,181
  • 2
  • 27
  • 42