12

I have a dual boot PC, where the Win10 (uncompressed) partition is encrypted with BitLocker. I was curious about making this test (and also encryption took quite a short time in my opinion), so while running Linux I did this:

# cat /dev/nvme0n1p3 | strings -25       
Remove disks or other media.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
... some (very few) lines of garbled characters
# 

The time it took makes sense according to the partition size and type of disk, and about the garbled text, I guess that a there is a small chance that encrypted data could happen to form a short text string

In that partition there are of course lots of plain text files, so is it safe to say that with this test it's 100% sure that all information is encrypted?

golimar
  • 239
  • 2
  • 7
  • 2
    That really depends on the size of the disk and the probability of the strings... Consider the probability of random bits producing a string then find the probability according to your size. If you really wonder that is encrypted, take a snapshot and decrypt yourself. – kelalaka Nov 22 '21 at 10:37
  • 13
    A good test of what, please? – Robbie Goodwin Nov 22 '21 at 22:55
  • 4
    It's probably fine if you're trying to distinguish between "either encryption or compression is turned on," and "both encryption and compression are turned off." For any other purpose, it's extremely suspect if not outright wrong. – Kevin Nov 23 '21 at 01:04
  • @RobbieGoodwin I wanted to check all blocks of the partition just in case BitLocker left out some part of the partition or failed at part of it. If I do the same to an unencrypted partition I can see contents of my files. With this test I can see no plaintext, but I don't know if something about file encoding, fragmentation, etc could make it not be shown in the output of `strings` – golimar Nov 23 '21 at 07:17
  • 1
    I'm sorry, I failed read right to the end of the Question. – Robbie Goodwin Nov 23 '21 at 16:21
  • 2
    Addressing "encryption took quite a short time", that is entirely possible if the drive supports hardware encryption, which means the data is already encrypted and all bitlocker needs to do is replace the blank key-encrypting-key with a new one. Unfortunately there are [known issues](https://www.zdnet.com/article/flaws-in-self-encrypting-ssds-let-attackers-bypass-disk-encryption/) with such drives so you may wish to [force software encryption instead](https://msrc.microsoft.com/update-guide/en-us/vulnerability/ADV180028) (default as of late 2019), or at least update the drive firmware. – Bob Nov 24 '21 at 14:28
  • Was that thesystem partition, which won't be encrypted, or the partition for C:, which will be? [BitLocker Drive Encryption Partitioning Requirements](https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/bitlocker-drive-encryption?view=windows-11#bitlocker-drive-encryption-partitioning-requirements). – Andrew Morton Nov 24 '21 at 16:16
  • The right thing is to do a statistical analysis test to measure entropy. One can often even distinguish _specific_ compression algorithms this way; it's certainly a good place to start to figure out "is this encrypted or just compressed?". – Charles Duffy Nov 24 '21 at 17:23

4 Answers4

44

No, this is not a good test, not at all.

If you do the same with a zip file, or a docx, or a PNG, you won't see text strings, but the file is not encrypted. Not being able to see plaintext does not mean the file is encrypted.

Believing that garbled means encrypted can lead to wrongs assumptions. If you take a look at a terrible XOR cipher with a single byte key you may think the result is encrypted.

Lots of plaintext means for sure that the drive is not encrypted. Lack of plaintext does not mean anything.

ThoriumBR
  • 50,648
  • 13
  • 127
  • 142
  • I don't think that garbled means encrypted. But if I do the same to an unencrypted partition I see text (for example I see normal text, HTML and others, and I can identify that as stuff that I have saved on that disk) – golimar Nov 22 '21 at 13:39
  • 17
    If your drive have NTFS compression enabled, you would see garbage on an unencripted drive too. – ThoriumBR Nov 22 '21 at 13:44
  • 4
    In other words, compression can _look like_ encryption, if your test is just "is the plaintext visible". But it's not at all secure, so mistaking one for the other is a problem. – Bobson Nov 22 '21 at 18:57
  • Compression can be distinguished from PRP. This is not a correct way to provide a counterexample. – kelalaka Nov 22 '21 at 20:59
  • 17
    You cannot distinguish Compression from PRP by looking at the result from `strings`. – ThoriumBR Nov 22 '21 at 21:14
  • It is not about strings it is about randomness! – kelalaka Nov 23 '21 at 13:17
  • 1
    OP literally ran `strings` to check: `cat /dev/nvme0n1p3 | strings -25` – ThoriumBR Nov 23 '21 at 14:07
  • 3
    @kelalaka The question is not about randomness, it's about strings. – HiddenWindshield Nov 23 '21 at 17:30
  • @HiddenWindshield I'm well aware of that. My comments are more general form, and the real answer is quite complicated that must establish some probability about forming a string of a PRP, etc. If the data size is huge the lack of plaintext can say something like it is not an encryption etc. The easiest way is to [decrypt it with the encryption key](https://security.stackexchange.com/questions/257362/is-looking-for-plain-text-strings-on-an-encrypted-disk-a-good-test/257363?noredirect=1#comment530310_257362). However, one still trust all of the software even after testing.... – kelalaka Nov 23 '21 at 20:23
  • 1
    @kelalaka In the general case, yes, you are correct. – HiddenWindshield Nov 23 '21 at 23:04
  • Actually you likely would see some strings in a ZIP file and a DOCX file (which is also a ZIP file), because the file paths are not compressed (or encrypted when making an encrypted ZIP file). It's not uncommon for a PNG to also have some plain-text metadata. – Alexander O'Mara Nov 25 '21 at 17:00
11

The test may be good or not, depending on what you need.

If you need to make a distinction between a known good encryption (bitlocker is pretty much acceptable for a lot of purposes) and a plaintext data, it is good.

The situation is much more frequent than you think.

It is probably better to use something like hexdump -C /dev/nvmexxx | less

A lot of filesystem structures are pretty much recognizable and low entropy that one can see even before reaching to the actual file data. E.g. a FAT32 table will have a sequences looking like xxxAxxxBxxxCxxxDxxxE ...

On the other hand, it is pretty much useless if you need to make a distinction between good and bad encryption. ... unless you are a trained cryptographer and you look for a particular pattern. In this case, you know the answer to this question anyway.

fraxinus
  • 3,425
  • 5
  • 20
  • 2
    Thanks. No, I don't really want to make a distinction between good and bad encryption, I just did what an average user may do if they found a lost external disk drive and couldn't mount it but wanted to take a peek – golimar Nov 23 '21 at 07:21
4

Of the 256 values in a byte, 92 (126-32, 13, 10) are visible ascii characters. So there is about 1/3 probability that a random value is considered eligible by strings.

So the probability that 25 bytes in a row are visible characters is approximately (92/256)^25 =~ 0,000000000007739 which is approximately 10^-11. This is comparable to a terrabyte, so you should expect to see a few strings if dumping a harddrive full of randomly distributed data.

  • It's a 329 GiB partition and it showed about 5 lines of "text" – golimar Nov 23 '21 at 08:11
  • 2
    @golimar Some of those might have been from the Bitlocker metadata, which is not encrypted (it can't be; it tells the OS how to derive the decryption key) and contains some amount of human-readable text (not 25 characters in a row, I don't think, but enough to have an impact on the odds of finding such a string). – CBHacking Nov 23 '21 at 08:24
  • 1
    @golimar seems like a pretty reasonable guess that it's encrypted or compressed then. or it just isn't storing any text for some reason, like a drive full of pirated movies with filenames under 25 characters. – user253751 Nov 23 '21 at 13:08
  • For fun, exercise: what is probability to find `hello` in 329 GB of random-looking bytes? What is the probability to find at least one english word in these 329 GB? (let's say from the top-1000-english-words dictionary) – Basj Nov 25 '21 at 08:53
  • @basj Well, you have 1/256 for each character instead of 92/256 so feel free to do the math. – Thorbjørn Ravn Andersen Nov 25 '21 at 08:58
  • The probability seems to be 90% to find `hello` in 1 TB of random (or encrypted) bytes :) – Basj Nov 25 '21 at 09:22
2

the Win10 is encrypted with BitLocker.

It means that the disk is fully encrypted, by design. Unless there is an in-progress encryption process, where some files are not (yet) encrypted you can rest assured the drive is safe. You could check for the BitLocker partition ID in the GUID table

I guess that a there is a small chance that encrypted data could happen to form a short text string

That looks like a bit of the infinite monkeys theorem. Ideally, encryption generates cipher text where each bit has a 50% probability to appear. That means a good cipher is like putting all possible n characters strings (typed on stripes of paper) in a ballot and randomly picking one. Intuitively...

So there is an infinitely low, but > 0 chance that cipher text appears to be like a plain string. E.g. you could even read hello somewhere in the text, just because random bytes have formed the ASCII for hello, which doesn't mean someone is greeting you or there is an unencrypted file.

While there are more odds of winning the hardest lottery around the world (6 figures over 90), mining plenties of cryptocurrency, you have a > 0 chance to spot a text looking meaningful to you.

I guess that the experiment itself has little to no scientific validity.

usr-local-ΕΨΗΕΛΩΝ
  • 5,310
  • 2
  • 17
  • 35