2

I'm trying to design a security scheme that involves a shared secret but isn't a traditional account password situation. The server would store a set of "keys", each of which has a blob of data associated with it. In order for anyone to access the data for a given key, all they need to know is the plaintext name of the key. So if Alice creates data using "pineapple" as the key, Bob can ask the server for the data for the key "pineapple" and the server will return the data.

It's completely intentional that Bob could share the secret word to other people, or for people to randomly guess "pineapple" and accidentally get the data. I only want to avoid someone being able to brute force very large numbers of keys for common dictionary words easily. I would like none of the plaintext data to ever be sent to the server, so that the people running the server could not spy on the user data or even know what the plaintext keys are. And ideally if the server was compromised, it would take a long time to brute force each of the plaintext keys and/or decrypt the corresponding data.

My idea for how this could work, is that if Alice wants to create new data, their client takes the key "pineapple" and runs a very slow hash algorithm on it, eventually creating the corresponding hash code for pineapple. Then their client encrypts the data package with "pineapple" as well using some sort of encryption method that is difficult to brute force. Alice would then send both to the server, which would check to see that the hash doesn't already exist, and then store the hash/data pair. Later, Bob could repeat the same process of creating a hash code for pineapple, then ask the server for the data for that hash, and finally decrypt the returned data using pineapple as the key. The process of creating the initial hash would be slow, but both Alice and Bob could store it locally in their client so it would only have to be done once per key.

Is there any better way of doing this? Are there slow hash algorithms that don't involve using a salt, which would prevent Alice and Bob from figuring out the same secure hash code without communicating with each other? Is there some way of using a salt but still using this general method where the server never sees any plaintext? Are there security concerns with this sort of scheme that I'm not considering?

1 Answers1

3

Key Derivation Functions

What you are describing is really no different than the key derivation functions (KDF) already used in many encryption/password hashing contexts. Plenty of KDFs have a configurable cost parameter that can allow you to increase or decrease the total "CPU Time" (I'm simplifying) needed to generate the actual key from the password. In this case what you call the key (i.e. "pineapple") would effectively be a password that is run through a KDF to generate a cyrptographic key, which is then used for encryption/decryption of the data. You would never have to send the password along to the server - just the generated key, which is always built using the KDF on the client side.

Cost factors

That last bit (generating the key client side) would be important, because for what you are talking about the cost factor would have to be very high. That might be a detail you are missing. The only way to both let people share a password and even guess passwords, but not be able to effectively brute force random passwords, is if the KDF is very slow. Normally the cost parameter is tuned so that the time to generate the key on "good" hardware is, for instance, 0.2 seconds (I'm over simplifying and skipping lots of details for the sake of example). The reason you pick a timescale like ~0.2 seconds for key generation is to balance the desire to make brute force expensive, but also make it so that your system can actually perform the necessary operations itself in the timescale that users are willing to wait (i.e. no one wants to wait 30 seconds each time they login).

The trouble

However, even if an attacker can only check one "password" a second, there are still 86400 seconds in a day, which means that you can get through the entire oxford english dictionary (~170,000 words) in just a few days. As a result, to stop someone from feasibly brute-forcing common dictionary words, you're talking about a key generation process that would have to take at least a few minutes, and I doubt that would stop a dedicated attacker (especially if they have top end hardware that can run the KDF in less time). If you do that though, that means that your legitimate users will also have to wait those same minutes before they can get results out of your system. After all, there is no way to differentiate between a legitimate user of your system and an attacker, so the only way to slow down attackers is by slowing down everyone. So to do what you want to do you will have to slow down your KDF so that it takes some "long" amount of time to run, but it is quite possible that the level of "slowness" you need to secure to the level you desire (i.e. stopping brute force of common dictionary words) will be so slow that your users are unwilling to wait for your system to do its thing. Ultimately, finding that balance between usability and security will be up to you and your users.

A possible solution

You can fix some of the trouble by using server-side throttling instead of making an arbitrarily large cost factor. Imagine if the API calls where the user says "give me data for this key" does not immediately return results but instead generates a queued request that only responds after X period of time. This would allow you to block brute-forcing regardless of what level of hardware the attacker has access to, would not slow down your server with a lengthy KDF process, and also would not penalize users with slow hardware. The disadvantage of course is that it is a server-side control, so if your database leaked a hacker could go to town on brute-forcing at the speed of their hardware.

Conor Mancone
  • 29,899
  • 13
  • 91
  • 96
  • Thanks for the information and pointer to the correct terminology. I didn't explicitly mention in the post but my assumption would be that the hash generation process would be on the order of magnitude of minutes for end users. Like you say it would be tricky to tune that to really prevent someone from putting in the time to brute force common words, but I want to at least keep it from being trivial to do. – Chris Graham Sep 18 '18 at 17:25
  • 1
    Even worse: with the advent of cloud computing, running a single computer for 170,000 minutes cost exactly the same amount as running 1,000 computers for 170 minutes. As such, this form of obfuscation is nearly useless. – Motoma Sep 18 '18 at 17:37
  • It gets tricky because an attacker may have access to much better hardware than your users, and with hash generation processes that long you certainly don't want to run the KDF server-side. As an example of the kind of problem you might have you might tune the KDF so that it runs in minutes on your own system, find out that it then takes ~30 minutes to run on a low-end android phone, and then find out that an attacker has a GPU-based cracking rig that runs the same KDF+cost factor in 5 seconds. I'm making up numbers but you get the idea. – Conor Mancone Sep 18 '18 at 17:37
  • @Motoma indeed! If you have something valuable enough that someone is willing to throw that much computing resources at it, then this sort of solution is not going to get you far. – Conor Mancone Sep 18 '18 at 17:43
  • @ChrisGraham I added an edit with a suggestion. – Conor Mancone Sep 18 '18 at 17:43
  • @ConorMancone It doesn't even have to be that high-value. Using the numbers above (170,000 dictionary words at 1/second) it would take $131.46 to crack this. It could be done in 3 hours, using 1000 t2.medium instances. – Motoma Sep 20 '18 at 16:49