Brute-Force/Dictionary attack against encypted file using PBKDF2 key derivation

Question

I have been following this very useful post by Thomas. My use case is slightly different. I am developing a mobile application which requires some sensitive data to be stored on the device in a SQLite File. I am using SQLCipher for encrypting the file. The encryption key is based on a 10 character long(at least one upper-case, one lower-case, one symbol and one number) pass-phrase entered by user at first authentication on that device. SQLCipher internally uses PBKDF2 to generate the encryption key. As per the NIST Publication the entropy for this pass-phrase is around 28 bits.

So according to the formula in post referred above:

v * 2n-1 >> f * p

v: Time required to compute derived key(DK) using one candidate pass-phrase using PBKDF2 function.

n: Entropy of the pass-phrase in number of bits.

2n: Total possible combinations using n bits.

2n-1: Average of 1 and 2n. The average number of candidate pass-phrases to be tried before actual pass-phrase is cracked.

f: Factor representing the computational power available to attacker relative to the system that ‘v’ was calculated on.

p: Patience of the attacker in terms of time.

p comes out to be 7.77 days, using f = 200 (attacker having 200 times the computation power) and using 1 second for one key derivation. As mentioned above, n = 28.

Now coming to my questions:

What is a realistic estimate for f. I know no one can come up with exact value, but estimates like worst case, average case, etc. would be useful.
NIST Publication referred above also suggests using dictionary along with composition rules to ensure stronger passwords. I have downloaded password dictionaries here. Now I want to know what does NIST publication mean by transformations of dictionary words? Does it mean that if dictionary contains word "rock" should I disallow all pass-phrase that have rock in them? So rejecting rock@Ston3e even though it seems a strong password.
I want to use pepper, so here is how I am planning on using it. Pass-phrase is input to PBKDF2 and [random-salt(256bit)+pepper(256 bit)] is the salt for the operation. Several thousand iterations. Then feed the resulting hash to SQLCipher which itself performs 64000 iterations of PBKDF2 with 128 bit salt to reach the encryption key. Is there any problem with the above suggested use of pepper and PBKDF2?
One of my colleagues suggested not storing hashed pass-phrase anywhere on disk. Rather do a SQLite query to fetch some user info from database(this info is needed anyway and hence no overhead). SQLCipher will give an error if pass-phrase entered is wrong since encryption key generated will also be wrong and decrypted SQLite data won't make any sense (not properly formatted as expected by SQLCipher). The advantage I see in this approach is that now hashed pass-phrase is not stored as it is but still its information is distributed in SQLCipher data. So now attacker can not copy hash value from a file and paste it in his/her code and run an offline dictionary attack. He or she can still copy the SQLite file to his/her computer, but now to verify the pass-phrase, an attacker will have to carry out database queries or at least validate the database in some way. My assumption is that while GPUs are fast at calculating hashes, they can't perform these database operations with the same efficiency. Hence the idea is to slow the attacker down by introducing operations that his or her hardware is not optimized for. Is my assumption correct?

Too many questions. Please split for clarity. The last point needs much editing. — Deer Hunter, Apr 15 '14 at 11:00
Regarding your last question. Assuming that current hardware isn't optimized to circumvent your strategy, doesn't sound like a very good idea. Who knows what GPUs (or any other hardware, for that matter) will be capable of doing in the near future? — Steven Volckaert, Apr 15 '14 at 11:25
28 bits of entropy in a password isn't enough, even with several million iterations. — CodesInChaos, Apr 15 '14 at 11:37
A realistic estimate for "f" depends on who you expect your attacker to be. The CIA, for example, would have a much higher value than a bored teenager. — Mark, Apr 15 '14 at 20:00
@CodeInChaos Yes, I know. I was thinking of also not allowing dictionary based words or their iterations. Hence I asked the second question. That shall take entropy to 32bits. I think anything beyond 32bits will be asking too much from users. — Taha, Apr 16 '14 at 07:35
@Steven GPUs in future may be able to do that, but GPUs today are already good at calculating hashes, and if hashed pass-phrase is stored as textual data or even binary data, it far more easy to crack(attacker can copy the value in his/her code and perform a if == check). As far as I know nothing is unbreakable, its just bit and pieces coming together making the system harder to break. And I think this method needs to judged relative to usual approach. What do you think? — Taha, Apr 16 '14 at 07:41
@Mark So can you suggest f both, that will give some idea about two extremes, isn't it. A possible f for some very organized hacker group. — Taha, Apr 16 '14 at 07:44
Unfortunately, I feel my knowledge about the usual approach is too limited to give a valuable answer. Given that, I favor using existing libraries / algorithms over implementing something as critical as this by myself. Also, as CodesInChaos already said, 28 bits of password entropy is, given your security concerns, probably insufficient. — Steven Volckaert, Apr 16 '14 at 07:46
Take a look at the [https://hashcat.net/oclhashcat/](https://hashcat.net/oclhashcat/) benchmarks to get an idea of what a single modern computer can do with 8 GPU's attached. I suspect your 10 digit limit is going to be a killer; it's just too short. — Anti-weakpasswords, Dec 29 '14 at 05:54

Brute-Force/Dictionary attack against encypted file using PBKDF2 key derivation

0 Answers0