How to verify that hard drive is filled with zeroes in Linux?

I have hard drive filled with zeroes.

How to check if all bits on hard drive are zeros using bash?

gkfvbnhjh2

Posted 2013-03-02T11:24:43.383

Reputation: 173

Would it be acceptable to just overwrite the entire drive with zeroes? Or do you actually need to confirm the current contents? – Bob – 2013-03-02T11:29:34.080

I want to verify that hard drive is filled with zeros. – gkfvbnhjh2 – 2013-03-02T11:51:37.750

1In theory there could be bug in data sanitization tools that lefts some data intact. I wan't to be sure that every bit is zero.

So how do I check if hdd is full of zeroes? – gkfvbnhjh2 – 2013-03-02T12:05:03.470

Why zeroes? Would you not randomly write zeros and 1s, several times? – None – 2013-03-02T14:22:38.627

13Because 1s are narrower than 0s - you can see the old data between them more easily. – ChrisA – 2013-03-02T16:01:17.667

@ChrisA Then why not fill it with ♥? They're filled in, as well ;) – Izkata – 2013-03-02T17:37:36.280

Answers

od will replace runs of the same thing with *, so you can easily use it to scan for nonzero bytes:

$ sudo od /dev/disk2 | head
0000000    000000  000000  000000  000000  000000  000000  000000  000000
*
234250000

Gordon Davisson

Posted 2013-03-02T11:24:43.383

Reputation: 28 538

8I'd add | head to the end of that, so that if it turns out that the drive isn't zeroed, it stops after producing just enough output to show the fact, instead of dumping the entire drive to the screen. – Wyzard – 2013-03-02T19:30:56.267

2@Wyzard: Excellent idea; I'll add it to my answer. – Gordon Davisson – 2013-03-02T19:49:04.700

I've written a short C++ program to do so, source available here.

To build it:

wget -O iszero.cpp https://gist.github.com/BobVul/5070989/raw/2aba8075f8ccd7eb72a718be040bb6204f70404a/iszero.cpp
g++ -o iszero iszero.cpp

To run it:

dd if=/dev/sdX 2>/dev/null | ./iszero

It will output the position and value of any nonzero bytes. You can redirect this output to a file with >, e.g.:

dd if=/dev/sdX 2>/dev/null | ./iszero >nonzerochars.txt

You might want to try changing BUFFER_SIZE for better efficiency. I'm not sure what an optimum value might be. Note that this also affects how often it prints progress, which will affect speed somewhat (printing output to the console is slow). Add 2>/dev/null to get rid of progress output.

I am aware this is not using standard bash, nor even builtins, but it should not require any extra privileges. @Hennes' solution is still faster (I haven't really optimised anything - this is the naïve solution); however, this little program can give you a better idea of just how many bytes your wiper has missed, and in what location. If you disable the progress output, it'll still be faster than most consumer hard drives can read (>150 MB/s), so that's not a big issue.

A faster version with less verbose output is available here. However, it is still a little slower than @Hennes' solution. This one, however, will quit on the first nonzero character it encounters so it is potentially much faster if there's a nonzero near the beginning of the stream.

Adding source to post to keep answer better self-contained:

#include <cstdio>

#define BUFFER_SIZE 1024

int main() {
    FILE* file = stdin;
    char buffer[BUFFER_SIZE];
    long long bytes_read = 0;
    long long progress = 0;
    long long nonzero = 0;

    while (bytes_read = fread(buffer, 1, BUFFER_SIZE, file)) {
        for (long long i = 0; i < bytes_read; i++) {
            progress++;
            if (buffer[i] != 0) {
                nonzero++;
                printf("%lld: %x\n", progress, buffer[i]);
            }
        }
        fprintf(stderr, "%lld bytes processed\r", progress);
    }

    fprintf(stderr, "\n");

    int error = 0;
    if (error = ferror(file)) {
        fprintf(stderr, "Error reading file, code: %d\n", error);
        return -1;
    }

    printf("%lld nonzero characters encountered.\n", nonzero);
    return nonzero;
}

Bob

Posted 2013-03-02T11:24:43.383

Reputation: 51 526

This is a great answer, but is there any way to make the script work more like a normal command - using iszero /dev/sda rather than requiring it to be piped with something like iszero < /dev/sda? – Hashim – 2019-12-01T22:28:02.980

1@Hashim This was written as more or less a throwaway program quite a while ago (nowadays I'd at least do it in a scripting language like Python rather than compiled C)... that said, if you wanted to take arguments in the most simple way, it'd be somewhere along the lines of making it int main(int argc, char *argv[]) and then FILE* file = fopen(argv[1], "r");. Done properly it'd include checking if the argument actually exists, error checking successful open (do an additional ferror check after the fopen), etc., but too much trouble for a throwaway program. – Bob – 2019-12-02T03:19:08.297

That's fair enough. Is there any particular reason you would do it in Python rather than C? Surely C would be a lot faster? – Hashim – 2019-12-02T04:21:32.453

1@Hashim I suspect SIMD vectorised operations in numpy would be close to vectorised instructions in C. And that's assuming the C compiler is smart enough to vectorise the loop in the naive C program. Would have to benchmark to be sure; unfortunately I don't really have the time to do that right now. Main advantage of Python (et al.) is it's generally available and runnable without a compiler, while gcc is not necessarily available on all Linux distros without pulling down additional packages. Then again numpy isn't part of standard Python packages either... – Bob – 2019-12-02T04:33:33.450

I have a working build environment and much prefer having exes that I can call instead of scripts. I'll carry on doing some basic tests with the fast version of this program and other solutions to find out which is faster for my needs. Thanks again for coding it. – Hashim – 2019-12-02T04:38:19.240

1@Hashim If you compile with -O3 and -march=native you might see some speedups; that should make sure GCC enables auto-vectorisation and uses the best available for your current CPU (AVX, SSE2/SSE3, etc.). Along with that you can play with the buffer size; different buffer sizes may be more optimal with vectorised loops (I'd play with 1MB+, current one is 1kB). – Bob – 2019-12-02T04:46:33.980

@Hashim Above comment edited, in case you didn't see. Beyond that, if you'd like to discuss further, you can ping me (@Bob) in chat: https://chat.stackexchange.com/rooms/118/root-access

– Bob – 2019-12-02T04:48:43.320

Expanding on Gordon's answer, pv provides an indication of how far along the process is:

$ sudo pv -tpreb /dev/sda | od | head
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
9.76GiB 0:06:30 [25.3MiB/s] [=================>               ] 59% ETA 0:04:56

Chris

Posted 2013-03-02T11:24:43.383

Reputation: 97

This is very useful with a big hard drive! – Martin Hansen – 2014-03-05T23:04:19.100

This seems an ugly inefficient solution, but if you have to check only once:

dd if=/dev/sdX | tr --squeeze-repeats "\000" "T"

Using dd to read from disk sdX. (replace the X with the drive you want to read from),
Then translating all unprintable zero bytes to something we can handle.

Next we either count the bytes we can handle and check if it is the right number (use wc -c for that), or we skip counting and use the -s or --squeeze-repeats to squeeze all multiple occurrences to a single char.

Thus dd if=/dev/sdX | tr --squeeze-repeats "\000" "T" should print only a single T.

If you want to do this regularly then you want something more efficient.
If you want to do this only once then this kludge may verify that your normal wiper is working and that you can trust it.

Hennes

Posted 2013-03-02T11:24:43.383

Reputation: 60 739

Why do you consider this solution to be inefficient? Is there some buffering that requires reading far past the first non-NUL location? – Daniel Beck – 2013-03-02T12:55:11.617

Is there a potential problem where a literal 'T' is present in the steam as the only nonzero character? – Bob – 2013-03-02T12:57:30.097

True. That is a flaw in the design. I am also not using bash (the shell itself), but I assumed that with "Bash" you meant "Not from bash, from using any shell prompt and standard text mode tools". – Hennes – 2013-03-02T13:06:25.273

3@daniel: A simple C program should be able to read all data without changing every read byte. Which would be more efficient and aesthetically pleasing. It might also take much more time to write such a program than to just use available tools in an inefficient way. – Hennes – 2013-03-02T13:06:53.260

To check only, you'll see any blocks that don't match listed

sudo badblocks -sv -t 0x00 /dev/sdX

Or use badblocks to write them as well as check:

sudo badblocks -svw -t 0x00 /dev/sdX

The default destrucive test is my secure erase of choice

sudo badblocks -svw /dev/sdX

If anyone can retrieve anything after filling the drive with alternating 0s and 1s, then their complement, then all 1s, then all 0s, with every pass verified it worked, good luck to them!

Makes a good pre-deployment check on new drives too

man badblocks

for other options

Not saying it's fast, but it works...

Beardy

Posted 2013-03-02T11:24:43.383

Reputation: 79

Best of both worlds. This command will skip bad sectors:

sudo dd if=/dev/sdX conv=noerror,sync | od | head

Use kill -USR1 <pid of dd> to see progress.

jiveformation

Posted 2013-03-02T11:24:43.383

Reputation: 21

Wanted to post this clever solution from a similar but earlier question, posted by a user who hasn't logged in for a while:

There is a device /dev/zero on a Linux system that always gives zeroes when read.

So, how about comparing your hard drive with this device:
cmp /dev/sdX /dev/zero
If all is well with zeroing out your hard drive it will terminate with:
cmp: EOF on /dev/sdb
telling you that the two files are the same until it got to the end of the hard drive. If there is a non-zero bit on the hard drive cmp will tell you where it is in the file.

If you have the pv package installed then:
pv /dev/sdX | cmp /dev/zero
will do the same thing with a progress bar to keep you amused while it checks your drive (the EOF will now be on STDIN rather than sdX though).

Hashim

Posted 2013-03-02T11:24:43.383

Reputation: 6 967

Some time ago I was curious about AIO. The result was a sample test program which happens to check for sectors (512 byte blocks) which are NUL. You can see this as a variant of a sparse file-regions detector. I think the source says it all.

If the entire file/drive is NUL output looks like 0000000000-eof. Note that there is a trick in the program, function fin() is not called at line 107 on purpose to give the shown output.
Not heavily tested, so may contain bugs
The code is a bit longer, as AIO is not as straight forward as other ways,
however AIO is probably the fastest way to keep a drive busy reading, because the NUL compare is done while the next data block is read in. (We could squeeze out a few more milliseconds by doing overlapping AIO, but I really do not think this is worth the effort.)
It always returns true if the file is readable and everything worked. It does not return false if the file is non-NUL.
It assumes that the file size is a multiple of 512. There is a bug on the last sector, however on a file entirely NUL it still works, as the memory buffers already contain NUL. If somebody thinks this needs a fix, in line 95 the memcmp(nullblock, buf+off, SECTOR) could read memcmp(nullblock, buf+off, len-off<SECTOR : len-off : SECTOR). But the only difference is, that the "end reporting" perhaps is a bit random (not for a file which is entirely NUL).
The changed memcmp() also fixes another issue on platforms, which do not NUL alloc()ed memory, because the code does not do it. But this only might be seen by files less than 4 MiB, but checknul probably is plain overkill for such a small task ;)

HTH

/* Output offset of NUL sector spans on disk/partition/file
 *
 * This uses an AIO recipe to speed up reading,
 * so "processing" can take place while data is read into the buffers.
 *
 * usage: ./checknul device_or_file
 *
 * This Works is placed under the terms of the Copyright Less License,
 * see file COPYRIGHT.CLL.  USE AT OWN RISK, ABSOLUTELY NO WARRANTY.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#include <malloc.h>
#include <aio.h>

#define SECTOR  512
#define SECTORS 40960
#define BUFFERLEN   (SECTOR*SECTORS)

static void
oops(const char *s)
{
  perror(s);
  exit(1);
}

static void *
my_memalign(size_t len)
{
  void      *ptr;
  static size_t pagesize;

  if (!pagesize)
    pagesize = sysconf(_SC_PAGESIZE);
  if (len%pagesize)
    oops("alignment?");
  ptr = memalign(pagesize, len);
  if (!ptr)
    oops("OOM");
  return ptr;
}

static struct aiocb aio;

static void
my_aio_read(void *buf)
{
  int   ret;

  aio.aio_buf = buf;
  ret = aio_read(&aio);
  if (ret<0)
    oops("aio_read");
}

static int
my_aio_wait(void)
{
  const struct aiocb    *cb;
  int           ret;

  cb = &aio;
  ret = aio_suspend(&cb, 1, NULL);
  if (ret<0)
    oops("aio_suspend");
  if (aio_error(&aio))
    return -1;
  return aio_return(&aio);
}

static unsigned long long   nul_last;
static int          nul_was;

static void
fin(void)
{
  if (!nul_was)
    return;
  printf("%010llx\n", nul_last);
  fflush(stdout);
  nul_was   = 0;
}

static void
checknul(unsigned long long pos, unsigned char *buf, int len)
{
  static unsigned char  nullblock[SECTOR];
  int           off;

  for (off=0; off<len; off+=SECTOR)
    if (memcmp(nullblock, buf+off, SECTOR))
      fin();
    else
      {
        if (!nul_was)
          {
            printf("%010llx-", pos+off);
            fflush(stdout);
            nul_was = 1;
          }
        nul_last    = pos+off+SECTOR-1;
      }
}

int
main(int argc, char **argv)
{
  unsigned char *buf[2];
  int       fd;
  int       io, got;

  buf[0] = my_memalign(BUFFERLEN);
  buf[1] = my_memalign(BUFFERLEN);

  if (argc!=2)
    oops("Usage: checknul file");
  if ((fd=open(argv[1], O_RDONLY))<0)
    oops(argv[1]);

  aio.aio_nbytes    = BUFFERLEN;
  aio.aio_fildes    = fd;
  aio.aio_offset    = 0;

  io = 0;
  my_aio_read(buf[io]);
  while ((got=my_aio_wait())>0)
    {
      unsigned long long    pos;

      pos   = aio.aio_offset;

      aio.aio_offset += got;
      my_aio_read(buf[1-io]);

      checknul(pos, buf[io], got);

      io    = 1-io;
    }
  if (got<0)
    oops("read error");
  printf("eof\n");
  close(fd);
  return 0;
}

Tino

Posted 2013-03-02T11:24:43.383

Reputation: 906