42

Is it possible to tell if a hard drive is encrypted, regardless of what software was used, i.e., Truecrypt / VeraCrypt / Bitlocker for AES-256?

Just the other day, I thought it could be possible to tell if I scan the drive with "Sector View" to read the data. If the data is filled with randomness then that means that it is encrypted. Is it that easy to tell?

techraf
  • 9,141
  • 11
  • 44
  • 62
cpx
  • 587
  • 1
  • 4
  • 8
  • 3
    If you can retrieve readable data without a decryption key then it's not encrypted. – user253751 Sep 04 '16 at 12:15
  • 27
    "If the data is filled with randomness then that means that it is encrypted" Of course... it could just be random data. – NPSF3000 Sep 04 '16 at 15:33
  • If you can record and compare hard drive activity, between one you know is not and one you suspect to be you would notice that one of them is much less performant on data read and perform a lot of read .. while some hardware are optimized for encryption, most of them suffer a performance loss. And there are volume header or apparent total empty or full or unknown volume type. – happy Sep 05 '16 at 01:23
  • You can trivially prove that it is not possible to tell reliably. Consider any possible method of determining that a hard drive is not encrypted, and some input X for which that method says it is not encrypted. (Some X must exist, or it's the trivial method "everything is encrypted", which is useless. Now, imagine some encryption scheme where X is the ciphertext corresponding to plaintext Y. Clearly, such a scheme can exist for any X and any Y. Thus for this encryption scheme, the method is at least wrong for input X. – David Schwartz Sep 05 '16 at 02:15
  • 7
    In the vast majority of encryption algorithms, it is very easily to identify the data, because it is in the header. Encryption is not the process of HIDING data, that is called Steganography. So in practice, you just search for key signatures... – Aron Sep 05 '16 at 06:52
  • 1
    @NPSF3000 if the NSA ever investigates me, I'm going to hide a bunch of hard drives with random data on them around my house, just to mess with their heads. – PyRulez Sep 06 '16 at 00:53
  • Don't laugh, steganographic schemes work, as do CHS based filesystems on raw disks, no partitions, no formal filesystem. Good luck getting squat out of that, especially with raid 2 or 3. – mckenzm Sep 06 '16 at 05:12
  • @mckenzm: Easier than you think... *good* steganography is hard. *Bad* steganography is like any other bad security scheme -- you *feel* secure, but are not. – DevSolar Sep 06 '16 at 08:23

4 Answers4

74

We have two types of encryption here, "file based encryption" and "full disk encryption". There are documented forensics methods and software (e.g. EnCase) that help us detect the schemes and programs used to encrypt the disk.

I'm going to take a list of popular tools and standards and see if they leave any traces with which we can determine that they've been used:

  • Bitlocker
    Bitlocker is a full disk encryption standard available on windows operating system from Windows 7 onwards; this tool uses AES256 to encrypt the disk. A disk encrypted by bitlocker is different than a normal NTFS disk. The signature of "-FVE-FS-" can be found at the beginning of bitlocker encrypted volumes.
    These volumes can also be identified by a GUID:
    • for BitLocker: 4967d63b-2e29-4ad8-8399-f6a339e3d00
    • for BitLocker ToGo: 4967d63b-2e29-4ad8-8399-f6a339e3d01
  • DiskCryptor/TrueCrypt/VeraCrypt
    DiskCryptor is based on TrueCrypt. For both DiskCryptor and TrueCrypt we can detect their presence with the following criteria:
    • size of file or collection of clusters object is a multiple of 512,
    • minimum size of object is 19KB, although by default is minimum 5MB,
    • contains no specific file signature throughout the entire object, and
    • has a high Shannon entropy or passes Chi-squared distribution test. Note that since there's no specific signature or header left behind we can't tell for sure if TrueCrypt (or its siblings) were used, by combination of several methods we can try to make better guess about its presence.
  • FileVault
    Filevault is Bitlocker's equivalent on Mac and offers full disk encryption. The signature of "encrdsa" or hex value of "65 6E 63 72 63 64 73 61" can be found at the beginning of FileVault encrypted volumes.
  • cryptsetup with LUKS
    Linux Unified Key Setup is a disk encryption specification and can be used in cryptsetup on Linux which is a common tool for storage media volumes encryption. It is optional and users can choose not to use this format but if used we can detect its presence with "LUKS\xba\xbe" signature at the beginning of the volumes.
  • Check Point Full Disk Encryption
    At sector offset 90 of the VBR, the product identifier "Protect" can be found. Hex value "50 72 6F 74 65 63 74"
  • GuardianEdge Encryption Plus/Anywhere/Hard Disk Encryption and Symantec Endpoint Encryption
    At sector offset 6 MBR, the product identifier "PCGM" can be found. Hex value "50 43 47 4D"
  • McAfee Safeboot/Endpoint Encryption
    At sector offset 3 MBR, the product identifier "Safeboot" can be found. Hex value "53 61 66 65 42 6F 6F 74"
  • Sophos Safeguard Enterprise and Safeguard Easy
    For Safeguard Enterprise, at sector offset 119 of the MBR, the product identifier "SGM400" can be found. Hex value "53 47 4D 34 30 30 3A"
  • Symantec PGP Whole disk Encryption
    At sector offset 3 MBR, the product identifier "ëH|PGPGUARD" can be found. Hex value "EB 48 90 50 47 50 47 55 41 52 44"

Measuring File Randomness To Detect Encryption

Methods discussed earlier may not be feasible for every disk/file encryption scheme since not all of them have specific properties that we can exploit to detect them. One other method is to measure the randomness of files and the closer they are to random, the more certain we are that encryption is used.
To do this we can use a Python script named file_entropy.py. The closer the entropy value is to 8.0, the higher the entropy.
We can extend this further and draw plots to visualize the distribution of bytes. (calculate-file-entropy)

One other pointer to detect encryption is that no known file signature will be spotted in the volume. (No jpg, no office documents, no known file types) And since compression methods (e.g. gzip, rar and zip) have known signatures we can differentiate them from encryption for the most part.

Sum up

  1. Use known signatures to detect encryption (if possible)
  2. Use special characteristics (minimum file size, high entropy, absence of any special file signature, etc.) to detect encryption
  3. Rule out compressed files using their signature

So going back to the main question, "Is it that easy to tell?", this falls under forensics methods, we may be dealing with steganography techniques. In a normal case where user isn't trying to fool us, it is somehow easy to tell encryption is in place but in real world scenarios where user's may try to hide things and deceive us they may just pipe /dev/urandom to a file! It's not gonna be easy.

Silverfox
  • 3,369
  • 2
  • 19
  • 39
  • 4
    Is it for certain that we can detect TrueCrypt encryption by following the above criteria or it is just a better guess? – cpx Sep 04 '16 at 08:49
  • @user12132 its just a guess as you see this is based on entropy and file size which can possibly overlap with other schemes. – Silverfox Sep 04 '16 at 08:53
  • 8
    You don't calculate the entropy of the file. Shannon entropy is defined for probability distributions and not for fixed pieces of data. You assume that the bytes in the file are independently drawn from a probability distribution. You then use the frequency of each possible byte value to approximate this probability distribution. Finally you compute the entropy of this probability distribution. – CodesInChaos Sep 04 '16 at 09:58
  • @CodesInChaos I'm using the term "entropy of the file" as how random the file is or how random it looks like, seems like I'm misusing the term, right? – Silverfox Sep 04 '16 at 10:03
  • 9
    @Silverfox That misuse is common, even among cryptographers. The bigger problem is that you only look at biases in the bytewise histogram and ignore all patterns that extend across multiple bytes. For example you'd give the sequence 0 to 255 repeated a perfect score, despite being a super obvious pattern. In theory there is [Kolmogorov complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity) which is defined for individual files, but it's *completely* unusable in practice. In practice the best we have are test suites like [die harder](http://www.phy.duke.edu/~rgb/General/dieharder.php) – CodesInChaos Sep 04 '16 at 10:05
  • @CodesInChaos I understand the point you're making, I'm gonna need further study in this field but I tried to rephrase my answer to try to address it for now. But apart from the academical terms, does it make a difference here? (measuring simple byte distribution vs true randomness) Because here we are only trying to differentiate between normal and encrypted files. – Silverfox Sep 04 '16 at 10:32
  • 4
    You can pass `-c` and `-h` to `cryptsetup` and use it without LUKS. Therefore cryptsetup is not exactly the same as LUKS. [here is an example of a question on U&L by someone who uses cryptsetup without LUKS](http://unix.stackexchange.com/q/304000/172635). And yes, cryptsetup without LUKS has no signature (that was OPs problem there) – grochmal Sep 04 '16 at 12:44
  • 2
    So basically, you "detect encryption" by using statistical methods to detect randomness. So if I filled a hard-drive with random garbage, it would be detected as "encrypted" using this method. I think that gives the user a fair repudiation :) – BlueRaja - Danny Pflughoeft Sep 04 '16 at 16:57
  • Wouldn't compressed data also look random? – PyRulez Sep 06 '16 at 00:54
  • @CodesInChaos also, encryption doesn't increase kolomgrov complexity much anyways. – PyRulez Sep 06 '16 at 01:00
  • Re: TrueCrypt/VeraCrypt criteria, are all of those items required to positively identify the presence of TrueCrypt, or just one of them? My brain just got this problem now whenever I see a list I'm not sure if they are "ANDs" or "ORs" between items. – Celeritas Sep 06 '16 at 07:41
  • 1
    @Celeritas There's an "And" between them, but let me point out again that since there is no specific header or signature left behind by TrueCrypt, we can't tell for sure that it's being used we can just make a guess, we can rule out normal files and compressed file by their signatures, and then use the criteria above to find TrueCrypt encrypted items. You can read further about trial to detect TrueCrypt [here](http://www.brimorlabsblog.com/2014/01/identifying-truecrypt-volumes-for-fun.html) – Silverfox Sep 06 '16 at 08:05
15

While you can't tell for certain, you can tell within a certain range of confidence.

Encrypted data looks like white noise: each bit has exactly a 50% probability of being set, regardless of the rest of the bits; there is no correlation between any given bit and any of the others. It's purely random.

It turns out that this high quality of randomness isn't particularly common in a hard drive's normal lifecycle. In general there's some pattern or another. Either a residual pattern from the manufacturing process, or a pattern from the filesystem setup, or a pattern from current or previously-deleted files. So if a disk contains pure white noise, then the most likely explanation is that either someone did a "secure erase" on the drive, or it contains encrypted information.

As an exception, one common unencrypted form of data that often looks a lot like noise is compressed data. The higher the compression ratio, the more it will resemble encrypted data. Still, compressed data usually has tell-tale markers so a more careful examination can generally rule that out.

tylerl
  • 82,225
  • 25
  • 148
  • 226
  • FWIW, White noise is ideal but their are block modes that still carry information and are therefore vulnerable to preimage attacks (ECB) – Yorick de Wid Sep 04 '16 at 08:23
  • @YorickdeWid TrueCrypt uses XTS, not ECB. If you look at a single snapshot of the disk, it should be indistinguishable from random, up to the birthday bound. I'm not aware of any disk encryption software where the actual mode of operation is distinguishable, though many have obvious headers. – CodesInChaos Sep 04 '16 at 09:50
  • @CodesInChaos So does BitLocker, but the OP doesn't specify a particular disk encryption tool. And ECB **is** still used – Yorick de Wid Sep 04 '16 at 09:56
  • 3
    I sort of disagree with you. The purpose of encryption isn't to HIDE information. Most encryption algorithms actively advertise its existence to the world (helps stop people accidentally deleting it). Conversely the process of hiding the data is called Steganography. – Aron Sep 05 '16 at 06:50
  • 2
    @Aron If the encryption container identifies itself then the question is moot. Yes, you can identify truecrypt files by the ".tc" file extension, but certain products offer "plausible deniability" features, allowing you to create an encryption container that has no identifying markers. This question only makes sense in that context. – tylerl Sep 05 '16 at 07:38
10

If the data is filled with randomness then that means that it is encrypted. Is it that easy to tell?

No. If I am going to throw out, or give away, a hard disk I would remove personal data from it by running a program like shred which replaces the contents with random data. Thus the presence of random data proves nothing.

You might then reformat that disk with some file system or other, which would make the first part of the disk look normal, and the rest random. That still doesn't prove if the random part is encrypted, or just left-over from the cleanup operation.

I'm a little surprised that most encryption products leave signatures lying around as described by Silverfox. That seems to lend the user open to brute-force decryption as described here:

Security

Nick Gammon
  • 1,197
  • 7
  • 15
  • 3
    I would think that, if you really wanted to do the work, something like `dd if=/dev/urandom of=/dev/disk...` would be more appropriate. Blow away the entire filesystem as well as all its metadata/journals/logs. – smitelli Sep 05 '16 at 00:01
  • 1
    I was using shred on the entire drive (eg. /dev/sda) not a partition. Shred lets you do multiple passes. You can finish up with a pass of zeroes if you want. I understand the issues with an SSD, but the question was about a hard drive. – Nick Gammon Sep 05 '16 at 05:04
  • 3
    *I'm a little surprised that most encryption products leave signatures lying around* - Cleartext "headers" require extra engineering effort to avoid, and in any situation where it matters, you're probably already screwed. Customs officials, for instance, may be just as suspicious of a hard drive completely full of white noise as they would be of a hard drive with obvious disk-encryption headers. – zwol Sep 05 '16 at 14:57
  • You could be right, but I would have thought that signatures could be detected in seconds using the right tool. The lack of any obvious signature can be explained away by saying you reformatted the drive after securely wiping it (eg. because your boss told you to before travelling to another country). – Nick Gammon Sep 05 '16 at 20:46
  • @zwol "officer, I'm just random like that. It's not encrypted, I swear!" – PyRulez Sep 06 '16 at 00:58
  • @grochmal: The "overwritten data" thing is a myth - see [Why is writing zeros (or random data) over a hard drive multiple times better than just doing it once?](http://security.stackexchange.com/questions/10464/why-is-writing-zeros-or-random-data-over-a-hard-drive-multiple-times-better-th). – sleske Sep 06 '16 at 06:49
  • A myth in what sense? It's a myth that it is useful, or it's a myth that it isn't? – Nick Gammon Sep 06 '16 at 07:59
  • @sleske - Wow, that's a great answer there. Then again, we cannot be sure even with `hdparm --security-erase` about relocated sectors. Keep a nice furnace close by to make sure disk are erased properly. – grochmal Sep 06 '16 at 12:40
  • @NickGammon - Well, the answer makes points on the overwrite but also on relocated sectors. You can understand it in the optimistic way: overwriting a HDD sector works! Or the pessimistic way: It works but the HDD firmware may redirect the write if you are unlucky. (I killed my pevious comment since it does not makes sense anymore) – grochmal Sep 06 '16 at 12:42
  • 1
    We are drifting into a discussion about how to make sure people can't get at your private data (eg. the firmware not overwriting the sector you thought it was - which is certainly an interesting point). However regarding the question itself, I don't think the mere presence of what looks like random data proves there is encrypted data waiting there to be discovered. There *might* be. – Nick Gammon Sep 06 '16 at 23:42
2

It is not as easy as looking at the hard drive and seeing random characters. There are random character shredding utilities that will overwrite a hard drive with random characters, making it a shredded drive, not an encrypted drive.