4

I like to overwrite my harddisk with random data.

Since /dev/urandom as source is too slow to overwrite a large amount of data in a reasonable time, I'm looking for a good alternative.

These two options meet my speed requirements:


(1) openssl with AES

openssl enc -aes-256-ctr -pass pass:"$(tr -cd '[:alnum:]' < /dev/urandom | head -c128)" -nosalt </dev/zero | dd bs=64K ibs=64K of=/dev/sdX status=progress

(2) shred

shred -vn 1 /dev/sdX

If I've understood correctly, shred uses ISAAC PRNG/stream cipher.

My question is: What option produces better (pseudo) random data or can these two called equal?

Z.T.
  • 7,768
  • 1
  • 20
  • 35
  • I don't think that this can be answered effectively unless you say why you want to _"overwrite my harddisk with random data"_. Forensic data recovery can be defeated by any repeat overwrites. Writing zeros will kill all viruses. So they are both equal and not equal, depending. If you're discarding any type disc, physically destroy it. And of course non of this applies to SSD or hybrid drives anyway. –  Sep 22 '19 at 13:03
  • If you fear from your data on your harrdisk (not SSD) will be read by forensic the cheapest option is destroying it. If you want to reuse it, overwrite it with any data many times or perform full disk encryption! – kelalaka Sep 22 '19 at 16:32
  • 1
    Welcome to crypto.stackexchange - We have a sister site, namely information security stackexchange. crypto.stackexchange is more or less about how the guts and theorems of the internals of cryptographic algorithms work, and security.stackexchange includes higher-level security concerns. This question appears to be more appropriate for security.stackexchange - Allow me to migrate this there for you. – Ella Rose Sep 22 '19 at 17:01
  • 1
    **dev/zero** is much faster and your data will be just as wiped. Is there a reason you need **random**? – user10216038 Sep 23 '19 at 00:45
  • @user10216038 A legitimate use case for overwriting with random data rather than with zeroes is to pre-fill a disk that will subsequently be used to store encrypted data. As long as TRIM isn't subsequently used, doing so makes it much more difficult to know which parts of the disk holds data of interest to an attacker. – user Sep 23 '19 at 12:05
  • @a CVn - That's true, but a good whole disk encryption will do that as part of its initialization anyway. Although I have seen substandard encryptors that fail to do that. In any case, the OP didn't indicate subsequent use beyond wiping. – user10216038 Sep 26 '19 at 19:58

1 Answers1

3

ISAAC says it is based on RC4 (even if it is better than RC4), so AES-CTR will be more secure. Plain RC4 has been disallowed in all secure communication for good reasons. Also, RC4 is only 0.460GB/sec/core, but it shouldn't be used even if it's fast.

On my laptop, AES-128-CTR can run at 4.8 GB/sec/core (openssl speed -evp aes-128-ctr) and ChaCha20 can run at 2.5 GB/sec/core (openssl speed -evp chacha20). Both are random enough for any purpose, though in theory ChaCha20 is more secure.

My linux 5.0.0 kernel gives me 0.161 GB/sec/core read from /dev/urandom (reading 1MB at a time), even though it uses ChaCha. Investigation below.

Disks have block remapping and SSD disks have very complicated garbage collection, overwriting the user-visible blocks might not be enough. So you should try to use the disk's firmware's "secure erase" features. For example, some disks actually use AES encryption even if it is not enabled by the user, just to ensure data stored on the disk has about equal number of 1s and 0s. Then, clearing just the AES key erases all data (you just have to make sure it really cleared the key, which might be complicated).

I tried perf to see what linux is doing:

      - 99.83% read
         - 99.81% entry_SYSCALL_64_after_hwframe
            - do_syscall_64
               - 99.81% __x64_sys_read
                  - 99.81% ksys_read
                     - 99.80% vfs_read
                        - 99.77% __vfs_read
                           - 99.26% urandom_read
                              - 90.55% extract_crng
                                 - 89.44% _extract_crng
                                    - 39.99% chacha_block
                                         chacha_permute
                                      2.85% _raw_spin_lock_irqsave
                                      1.46% _raw_spin_unlock_irqrestore
                                   0.53% _raw_spin_unlock_irqrestore
                                4.63% copy_user_generic_unrolled
                              - 1.62% __check_object_size
                                   0.77% check_stack_object
                                0.76% _copy_to_user

_extract_crng is spending all its time waiting for rdrand (called arch_get_random_long).

The chacha_permute, which takes less than 40% of the time, uses the slow non-vectorized version for some reason, when a vectorized version is available (and even an AVX2 version). Loading the module chacha_x86_64 didn't help. My guess was that the random char device driver is linked with the generic chacha implementation and will only use the fast chacha implementation if that is built linked into the kernel and not as a module, but after compiling a 5.3.0 kernel with vectorized chacha (CONFIG_CRYPTO_CHACHA20=y, CONFIG_CRYPTO_CHACHA20_X86_64=y) no dice - urandom is still using slow chacha. I don't know why.

Z.T.
  • 7,768
  • 1
  • 20
  • 35
  • 1
    Good question, tough "and SSD disks have very complicated garbage collection" I feel this is too abstract, can you lay out? – tungsten Sep 22 '19 at 17:45
  • 3
    @tungsten You can start there: https://en.wikipedia.org/wiki/Flash_memory_controller – Z.T. Sep 22 '19 at 17:48
  • also note, that despite urandom being slow, it is still faster than the majority of consumer hard disk drives – Richie Frame Sep 23 '19 at 05:26
  • @RichieFrame I think that's outdated. Samsung 860 EVO is a popular consumer drive, and it gives 520MB/sec sequential writes. NVMe drives are not expensive and give ~1800MB/s sequential writes. – Z.T. Sep 23 '19 at 05:34
  • @Z.T. hence why I said hard DISK drives, my 6-driveraid6 array pulls about 250MB/s, even though each individually does 160MB/s sequentially raw, which is typical for an enterprise grade 7200rpm SATA drive – Richie Frame Sep 23 '19 at 05:52