8

I would like to overwrite a very large hard drive (18TB) with random bytes, to then check smart data for reallocated sectors or other errors.

Since badblocks has some limitations on number of blocks it will work with in a single run, I have tried the "cryptsetup method" described on archlinux wiki:

https://wiki.archlinux.org/title/Badblocks#Finding_bad_sectors

I set up an encrypted logical device eld on the whole drive and then used the command "shred" to write zeroes to the opened eld device:

cryptsetup open /dev/device eld --type plain --cipher aes-xts-plain64
shred -v -n 0 -z /dev/mapper/eld

It went on to print lines such as

shred: /dev/mapper/eld: pass 1/1 (000000)...870MiB/17TiB 0%
shred: /dev/mapper/eld: pass 1/1 (000000)...1.7GiB/17TiB 0%
...
shred: /dev/mapper/eld: pass 1/1 (000000)...4.1TiB/17TiB 24%

but then it stopped at 4.1TiB/17TiB written. I've verified this with hexdump, zeroes were not written beyond byte address 0x428249b0000 (4570459340800 ~ 4.156 TiB):

hexdump -C  --skip 0x428249a0000 /dev/mapper/eld | head
428249a0000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
428249b0000  b3 cd d0 34 72 15 f2 2c  f6 32 90 fb 69 24 1f ec  |...4r..,.2..i$..|
428249b0010  a0 f4 88 a5 56 e7 13 82  94 e5 e0 f5 37 da c3 59  |....V.......7..Y|
428249b0020  9b 55 9f d8 39 a1 41 dc  52 ca 7b 3a 95 f5 59 e2  |.U..9.A.R.{:..Y.|

Many standard commands seem to have problems with high capacity disks because the numbers involved are too big for 32 bit data types. Which read/write tools on Linux are able to read/write beyond these 2TiB,4TiB imaginary boundaries reliably?

Ján Lalinský
  • 262
  • 1
  • 10
  • 4
    4 TB is the physical limit of MBR. Did you create an MBR partition table instead of GPT and set up a single partition? – Gerald Schneider Jan 22 '22 at 12:02
  • This is a new disk without any partitions. I don't think MBR or partitions are relevant, I want to overwrite the whole disk, so no MBR or GPT data should be preserved. – Ján Lalinský Jan 22 '22 at 12:05
  • 1
    _"but then it stopped at 4.1TiB/17TiB written."_ - How did it stop? No more progress? `shred` just exited cleanly? Any error message? Was there anything in the system logs at that time? Following up Gerald's question, does that mean `/dev/device` in your commands was a full disk, not a partition? – marcelm Jan 23 '22 at 13:46
  • @marcelm Shred continued to run but no more output for a long time after the 4.1TiB line. No error messages whatsoever on screen or syslog. Path /dev/device refers to the whole SATA hard drive. – Ján Lalinský Jan 23 '22 at 14:18

2 Answers2

12

Edit: Updated according to comment

I would simply use

dd if=/dev/urandom of=/dev/sdX bs=1M status=progress iflag=fullblock oflag=fullblock

Here /dev/sdX is the device for the hard disk.

Tero Kilkanen
  • 34,499
  • 3
  • 38
  • 58
  • 4
    This does not seem reliable, because reading from urandom may fail in the middle of a block and then dd will write less than full block of data. There is a way to fix this with iflag=fullblock oflag=fullblock, see https://unix.stackexchange.com/a/121888/90056 – Ján Lalinský Jan 22 '22 at 14:16
  • 7
    While overwriting with random data seems reasonable, using `/dev/zero` as the input should work against any but the most determined of attackers. – doneal24 Jan 22 '22 at 19:44
  • 5
    Also, /dev/urandom is very slow. Using something like openssl rc4 to generate pseudorandom data is likely much nearer to I/O speed at lower cpu. Or /dev/zero, which should be good enough. Or indeed a tool such as shred. – Remember Monica Jan 23 '22 at 01:26
  • 1
    @JánLalinský: Did things change, because I used to use this around 2000 and I never once observed a partial block from urandom? – joshudson Jan 23 '22 at 04:41
  • @joshudson I, rarely, yes. Really rarely and there were always some problematic circumstances. I thought it was caused by them, although I sometimes needed to strace dd to understand what is going on. – peterh Jan 23 '22 at 08:45
  • 1
    @JánLalinský, it shouldn't matter: if `dd` reads an incomplete block, it just also writes an incomplete block. All it means is that the blocks written will be out of alignment after that, but the OS does buffering on `/dev/sdX` anyway. It matters more with `count=NN`, since AFAIK the incomplete blocks will be included in the count. – ilkkachu Jan 23 '22 at 10:21
  • 2
    `urandom` is/was slow, though, at least when I last tested. I think the algorithm it used was changed (to ChaCha20 or such?) at some point, so it might be faster now. I think I used something like `openssl enc -aes-128-ctr -nosalt -pass file:/dev/urandom < /dev/zero | pv > ...` at some point. – ilkkachu Jan 23 '22 at 10:21
  • 1
    Why such a large `bs`? A smaller block size like 128k (about half the L2 cache size) is more likely to better overlap I/O with CPU cost of `read` on the `urandom` device. But as multiple commenters have said, a faster source of randomness is a *very* good idea. On my i7-6700k Skylake at 3.9GHz, Linux 5.12.15-arch1-1, `pv < /dev/urandom > /dev/null` reports 55.6 MiB/s. So depending on HDD speed, about half to a quarter the speed of a disk, making the process of writing 18TiB take twice to 4x as long. – Peter Cordes Jan 23 '22 at 10:38
  • Presumably you'd want to use a CSPRNG if you're going to bother writing randomness at all instead of zeros, but in general if you want a blazing fast source of randomness on an x86 machine, see [What's the fastest way to generate a 1 GB text file containing random digits?](https://unix.stackexchange.com/a/324520) - my answer could easily be changed to just store the raw xorshift128+ results from SSE2 or AVX2 vectors into an output buffer, instead of processing into ASCII digits+spaces. A single core should still run close to memcpy speeds, much faster than any HDD. – Peter Cordes Jan 23 '22 at 10:43
  • 5
    [`dd` is generally useless](https://unix.stackexchange.com/questions/12532/dd-vs-cat-is-dd-still-relevant-these-days) (yes, exceptions to that exist), it is probably slower due to suboptimal block sizes (and yes, `1M` is suboptimal), and it is [potentially dangerous](https://unix.stackexchange.com/questions/17295/when-is-dd-suitable-for-copying-data-or-when-are-read-and-write-partial). _Do not use `dd`._ Just use `cat`, or `pv` if you want a progress indicator. Those tools are much simpler, faster, and not riddled with pitfalls. – marcelm Jan 23 '22 at 13:53
  • Requiring random data to prevent data recovery on the media level [is a myth](https://security.stackexchange.com/questions/10464/why-is-writing-zeros-or-random-data-over-a-hard-drive-multiple-times-better-th) or at least severely outdated. Just use `/dev/zero`. – Zac67 Jan 24 '22 at 06:09
1

Instead of cryptsetup + shred, I used cryptsetup + pv (cat should work instead of pv too, but it would not be giving any progress info) and pointed stdin to /dev/zero:

cryptsetup open /dev/device eld --type plain --cipher aes-xts-plain64
</dev/zero pv >/dev/mapper/eld

This has the advantage (as compared to dd) that no obscure arguments need to be specified and performance over a SATA 3.3 6Gb/s link is good (>200MiB/s).

pv still failed when the end was reached, but I have checked that nevertheless it did overwrite the whole logical device with zeroes. Which means dm-crypt overwrote the whole hard drive with pseudo-random bytes.

Now hard drive errors can be checked in at least two ways:

1.Looking for degraded SMART data (like reallocated sectors) in the output of

smartctl -a /dev/device

2.Reading data from /dev/mapper/eld and checking that all read bytes have value zero. Running cmp command from diffutils to do this comparison:

cmp -l -b /dev/zero /dev/mapper/eld

It will either print byte address of the first mismatch and exit with error, or it won't find any mismatch and then it will print "cmp EOF on /dev/mapper/eld ..." (and still exit with error).

Mismatch means that either hard drive has a permanent failure of record at that position, or it can be a random error that will not repeat exactly at the same position.

On the first run of cmp, I indeed got an error already after 8 seconds, which I was very surprised to see. SMART data did not show any degradation, and syslog didn't reveal any error messages regarding the hard drive.

I then tried to run the cmp command again to check if the record error is real, but the mismatch at that position didn't occur again. It was some random error in the whole read+evaluate process. So don't rely on a single run of cmp command; in case a mismatch is found, run it again. If the error disappears, then ignore the first mismatch or maybe try once again. If the error persists, then return the hard drive to the seller as it is most probably defective and its degradation in time may be faster compared to a healthy hard drive.

.

Ján Lalinský
  • 262
  • 1
  • 10