You are doing it all wrong! But that's okay, amateur cryptography is where we all started and learned. Just be aware that your encryption software will not be secure unless you had it audited by an independent third party, and even then they make no guarantees. Do not use your own hobby crypto for anything important! And do not let others use it without making sure that they know it's just a toy!
Imagine you have a file that says "attack at 10:00". Encrypted, it is something like aWJiaWtzIGliIDEzOjAw
. Your enemy might know that you are sending the attack message, but does not know the password. What they can do, though, is change a byte. If the message is changed to aWJiaWtzIGliIDazOjAw
(the E
is replaced by a
), it suddenly says: "attack at 03:00".
As you see, modifying the ciphertext, even blindly, can be an advantage to the attacker (the enemy). You want to have authenticated encryption, where a modification is detected. This also solves your password problem.
You do not need to store the password in the file, because if they enter the wrong password, the authentication check will fail. The software will notice that the decrypted text changed from the original text, and you will know it's wrong.
There are also some more details on why you should use authenticated encryption:
There are a lot of resources online on how to do authenticated encryption. Some good methods to look for are AES+HMAC, or AES in an authenticated mode such as GCM or OCB. This Wikipedia article is a good introduction and contains pointers to algorithms that you might be able to use: https://en.wikipedia.org/wiki/Authenticated_encryption
As an example, let's consider the previous ciphertext: aWJiaWtzIGliIDEzOjAw
. Now we HMAC it with SHA-256, which gives us 3be635...
. I used a random online site to compute the HMAC and used the same password as I used for the encryption. (As a challenge to the reader, try to break it :). It's just one character.) We can store the two together: aWJiaWtzIGliIDEzOjAw|3be635
. When a user tries to decrypt the contents, they enter the password, you can compute the HMAC again, and if it does not match, you know that either the message was tampered with, or their password was incorrect.
The code for this is basically:
# Encrypt
password = user_input()
plaintext = "attack at 10:00"
ciphertext = encrypt(method = "AES-256-CTR", key = password, data = plaintext)
auth_code = hmac(method = "SHA-256", key = password, data = ciphertext
write_file(name = "encrypted", data = base64(ciphertext) + "|" + auth_code)
# Decrypt
password = user_input()
ciphertext, auth_code = read_file(name = "encrypted").split("|")
computed_auth_code = hmac(method = "SHA-256", key = password, data = ciphertext)
if computed_auth_code == auth_code:
plaintext = decrypt(method = "AES-256-CTR", key = password, data = ciphertext)
print(base64_decode(plaintext))
So far so good, but...
This method allows an attacker to guess a password very fast. The HMAC operation takes a few microseconds to compute, so an attacker can do many of guesses per second on a standard computer. The code is very simple:
ciphertext, auth_code = read_file(name = "encrypted").split("|")
for password in load_file("password_database.txt"):
if auth_code == hmac(method = "SHA-256", key = password, data = ciphertext):
print("The password is " + password)
Because the only operation in the for
loop is hmac(...)
and a quick comparison, it is very fast.
Before using the user's password for anything, you should apply a slow key derivation function (KDF). How to do this best is answered in the question: How to securely hash passwords?.
Your code should now look something like this:
password = user_input()
password = argon2(iterations = 1_000_000, memory = 100_000_000, data = password)
# Below here, the normal encryption/decryption code
Now, an attacker's code has to look like this:
ciphertext, auth_code = read_file(name = "encrypted").split("|")
for password in load_file("password_database.txt"):
password = argon2(iterations = 1_000_000, memory = 100_000_000, data = password)
if auth_code == hmac(method = "SHA-256", key = password, data = ciphertext):
print("The password is " + password)
Now the attacker has to do this argon2 thing, which is super slow. If it takes 1 second on your computer, then it will also take roughly 1 second on their computer (maybe a little faster, maybe a little slower). This means an attacker can only do one password guess per second. That's much safer! Of course, an attacker can use 100 computers, but that's a serious investment, and they would still have only 100 guesses per second instead of millions or even billions per second.