File format of encrypted file

3

1

I am interested in understanding the format of the encrypted files produced by the application Data Guardian. From what I can tell some sort of transformation of the password is stored in the field named 'hexIdentifier.' What I find odd is that if two files are saved with the same password, then they both have the same 'hexIdentifier' string. I am not advertising this app and I am not affiliated with the company. I am evaluating the app.

bpqaoozhoohjfpn

Posted 2012-02-08T01:19:26.303

Reputation: 45

2It wouldn't be much of an advertisement anyway if the password is stored in the file itself. – Andrew Lambert – 2012-02-08T01:21:25.213

This whole thread is an awesome read. I know nothing about nothing about the implementation of file encryption. All I know is to stick to what the experts use. – surfasb – 2012-02-08T09:48:36.193

Answers

6

I would be very surprised if it did actually store the password (encoded or not) in the encrypted file because doing so would demonstrate a fundamental lack of understanding of cryptography, in which case, it would be highly recommended to pass over the software if you need the encrypted files to be secure. There is no practical reason to store the password.

Even the low price is not a reason to use it if it does indeed store the password since there are plenty of other programs that are much more secure and even free (in fact, even standard archivers provide secure encryption functionality (without storing the password).


As for Data Guardian, I’ve got some bad news. I did some tests and you and Amazed are correct. It seems that the hexIdentifier field is not only related to the password, but it is not even a hash, it is the actual password! (albeit encoded, though not even with a large alphabet).

If you save the same file over and over with a password of increasing size (eg one character, two chars, three…) it will cause the field to change, but the size remains constant (64-bit), up to eight characters, then from nine to 16 characters, the field changes to 128-bit, and so on. In other words, it chunks (pads?) and encodes the password in 8-character blocks. If it were a hash, the size of the field would remain constant no matter the length of the password. Therefore, it actually encodes and stores the password itself.

There is a DLL in the program folder that indicates that it uses the Blowfish block-cipher (which uses 64-bit blocks—remember the 64-bit chunks above?), so the password is likely encrypted with that as well as the data (though separately from the data as opposed to as part of the same stream, which makes it even more vulnerable).

I’ve already ferreted out several aspects of the algorithm in just a few minutes of merely running in-program tests (while at the same time watching TV) without opening it in a disassembler or looking at a single line of code. I don’t imagine it would be too difficult for someone with the proper motivation to reverse it altogether.


In summary, Data Guardian is not reliable enough if you need encryption (sort of defeats the purpose of the name). If you don’t need the encryption or the data isn’t sensitive, then you can get by with it (it is a specialized record-keeping program as opposed to a generic encryption program). Otherwise if security is necessary, then you would be better off looking for another record management program with stronger encryption or else just use a normal program (or even Data Guardian) and encrypt the saved files with a generic encryption program (or NTFS encryption).

You could also contact the dev and ask if they can implement stronger encryption (even the standard Microsoft crypto API [1][2][3][4] would be good; also Crypto++ is common since Boost could not add one).

Synetech

Posted 2012-02-08T01:19:26.303

Reputation: 63 242

It does look as though the HexIdentifier tag is directly linked to the password. It may not be that it is the password, but there is a correlation. – Andrew Lambert – 2012-02-08T01:44:03.937

Probably a hash. Even that is inadvisable. It’s not hard to encrypt a small file with different passwords to derive the hashing function (especially if it is a standard one like MD5, SHA-1, etc.), then reverse the password via brute-force. – Synetech – 2012-02-08T01:50:31.347

@Syntech: If the password is stored in form of a salted hash and the hashing algorithm is sufficiently strong, deriving the password from the hash can be as hard as breaking the ciphering algorithm. Sure, if the password is weak, you can brute-force it, but that's also true for decrypting the file. – Dennis – 2012-02-08T02:13:17.537

@Dennis, not true. Given the salted hash and the algorithm, brute forcing the password is rather simple given that it is plain text and relatively short length. It certainly is harder than with an unsalted hash, but not nearly as hard as brute-forcing a 64+bit symmetric cipher. – psusi – 2012-02-08T02:23:52.627

@psusi: If you have a 64 bit password, there are 18,446,744,073,709,551,616 possible combinations. It doesn't matter is you're hashing or ciphering or if you have a salt or not, it's the same amount of combinations. So brute-forcing it is anything but simple. – Dennis – 2012-02-08T02:30:51.557

It also has a mere 16-character alphabet (only uppercase hex digits), so it’s not 64-bit. Either way, it’s not a hash, it’s the password itself. I didn’t test, but I would imagine it also limits the characters that can be used in the password. – Synetech – 2012-02-08T02:39:22.720

@Synetech: I have no idea how Data Guardian stores the password and, therefore, I wouldn't trust it with my data. I just objected to Probably a hash. Even that is inadvisable. It’s not hard to [...] reverse the password via brute-force. Most encryption programs have a way of saying wrong password, which means they store the hash of either the password or the plaintext. If you can obtain the password from that, the problem is in the algorithm. Proper hashing and encryption algorithms are immune to known-plaintext attacks. – Dennis – 2012-02-08T02:40:17.707

@Dennis, a password isn't 64 bits. It is a small number of ascii printable characters. That makes it about 64 possibilities for each of typically 8 or less characters, giving only 64^8 or 2.8e14 possibilities, quite a bit ( about 1000x ) less than 2^64. Given that passwords usually only have lower case letters and numbers, that makes it even easier. 8 lower case letters and numbers only has 36^8 or 2.8e12 possibilities, or 10 million times less than a 64 bit cipher. – psusi – 2012-02-08T02:49:15.230

Considering that collisions with MD5 (true 128-bit) can be found in seconds on an average contemporary system, I would imagine a 16-char hashing algorithm could be fully cracked too without much effort. – Synetech – 2012-02-08T02:50:40.737

Interesting analysis. Hopefully the developer can fix this. – bpqaoozhoohjfpn – 2012-02-08T02:51:16.930

@Dennis, proper encryption programs do NOT store the password in any form. They encrypt the data using a 64 or 128 bit randomly generated key. They encrypt that key using the hash of the password and store the encrypted key. The only way of knowing whether you entered the right password when you try to decrypt is by using its hash to decrypt the key, then use the key to decrypt the data, then see if the results look sane. – psusi – 2012-02-08T02:52:29.283

> The only way of knowing whether you entered the right password when you try to decrypt is by using its hash to decrypt the key, then use the key to decrypt the data, then see if the results look sane. Which is why when you decompress an encrypted archive, it doesn’t tell you if the password is correct or not until it finishes extracting one of the files and compares the hash to the one stored for that file. – Synetech – 2012-02-08T02:55:22.443

@psusi: a password isn't 64 bits. A password is whatever I choose it to be. 11 randomly selected alphanumeric characters (A-Z, a-z, 0-9) already achieve over 64 bit. In return, a weak password is a weak password, not matter what algorithm you use to mask its weakness. And brute-force always takes the same number of tries. By definition. – Dennis – 2012-02-08T02:56:10.900

Thought: are the results of this analysis in conflict with 'responsible disclosure?' – bpqaoozhoohjfpn – 2012-02-08T02:57:05.470

@Synetech: MD5 is a bad algorithm, and even with a good one, 128-bit hash is about 64-bit symmetric encryption. There are much stronger ones (like SHA-512) The only way of knowing [...] Which is why when you decompress an encrypted archive,... By that reasoning, encrypting a small file is a security issue. As I said, this is not an issue with a strong password and a good algorithm. – Dennis – 2012-02-08T03:00:31.020

1> A password is whatever I choose it to be. 11 randomly selected alphanumeric characters (A-Z, a-z, 0-9) already achieve over 64 bit @Dennis, a password is whatever the algorithm limits to. The size of the password depends on the alphabet. If I only allow the characters {abc}, then a four-character password does not mean 32,768 permutations (32-bit), it is only 81 permutations. Much easier (and faster) to crack. – Synetech – 2012-02-08T03:00:36.663

@Synetech: That's missing the point. If you use one of 81 possible password, you won't protect the data from anyone. You could even try those passwords by hand. No need for an automated brute-force attack. Again, I'm talking about a program that handles this properly. The size of the hash, by the way, doesn't say anything. For password comparison, storing the first 16 hexadecimal characters of a strong hash would suffice. – Dennis – 2012-02-08T03:02:50.720

@bpqaoozhoohjfpn, I don’t know. You’d have to check their site to see what claims they make. – Synetech – 2012-02-08T03:03:07.533

@Dennis, you’re the one missing the point. Nobody is using an 81-permutation algorithm; I am just trying to simplify it so that you can understand the speciousness of your argument. You keep saying that there is a large pool of passwords, thus making it time-consuming to reverse the password from a hash, but in this case (the question is about this software, not a generic encryption question), the pool is much smaller, making cracking quite feasible. – Synetech – 2012-02-08T03:05:40.273

@Synetech, not sure what you mean. Should the analysis be pulled until the dev has a chance to make the password storage stronger? – bpqaoozhoohjfpn – 2012-02-08T03:10:34.177

@Dennis, whatever the strength of the password, it is several orders of magnitude easier to brute-force it when you have the hash than when you have to use the hash to decrypt the session key, then try to decrypt and verify some actual data. – psusi – 2012-02-08T03:15:13.263

1@bpqaoozhoohjfpn, what analysis? I mean that they may or may not be legally responsible for depending on what they claim. If they say it uses strong encryption (especially if they specifically state algorithms and such), and a breach occurs, they may be taken to court for false claims, but if they admit weak encryption (or maybe even don’t say anything unless specifically asked), then they may argue that they never claimed it was secure. For example, if I say I am a carpenter then build a crappy chair, you could sue, but if I admit to being awful, then you can’t really complain, now can you? – Synetech – 2012-02-08T03:18:47.757

@Synetech, The analysis that you published in your answer. Maybe we should pull it until the dev has a chance to make improvements. – bpqaoozhoohjfpn – 2012-02-08T03:23:45.087

@bpqaoozhoohjfpn, I don’t see why. Did Russinovich withdraw his analysis of the Sony XCP rootkit? Nope. You are always free to implement one of the crypto algos I added to my answer. – Synetech – 2012-02-08T03:29:59.313

@Synetech, >You are always free to implement one of the crypto algos I added to my answer. -- Who? The app isn't open source. – bpqaoozhoohjfpn – 2012-02-08T03:38:36.137

@bpqaoozhoohjfpn, I meant the devs. ;-) (I included a link to the contact page.) – Synetech – 2012-02-08T03:46:13.220

@Synetech, So the problem in short is that the data may be encrypted with Blowfish, but the password is stored in such a way that potentially makes it possible to discover (although no method for discovering the password has actually been proven here). I guess I will point the devs to this thread. – bpqaoozhoohjfpn – 2012-02-08T03:57:54.733

@bpqaoozhoohjfpn, sure, if you want. Also try using Twofish instead, and/or using a larger alphabet, and/or encrypting the password in the same stream as the data instead of separately. – Synetech – 2012-02-08T04:09:30.873

@Synetech, I'm confused again. The latest edit to your answer indicates that the password IS likely encrypted with Blowfish (rather than being a home grown encoding as originally thought). Do you still feel that it would be realistic to recover the password from the encrypted string? – bpqaoozhoohjfpn – 2012-02-08T05:47:52.740

@bpqaoozhoohjfpn, I had forgotten that there was a Blowfish DLL in the program folder. Since Blowfish is a block cipher that uses 64-bit blocks, it fits with my observation that the hexIdentifier field expands in 64-bit chunks. I think it's safe to assume that the same encryption is used throughout the program for everything that needs to be encrypted. As for cracking it, there are no known shortcuts or tricks to make it easier, but like any other algorithm, it can still be at least brute-forced given enough motivation and resources.

– Synetech – 2012-02-08T14:48:43.697

@Synetech, any thoughts on the hexIdentifier being the same for different files that use the same password? p.s just talked to the dev, they say the hexIdentifier is not part of the encryption and that they will remove it from future versions. – bpqaoozhoohjfpn – 2012-02-08T19:17:11.680

@bpqaoozhoohjfpn, no idea then; you’d have to ask them. – Synetech – 2012-02-08T19:26:36.333