15
7
I have hard drive filled with zeroes.
How to check if all bits on hard drive are zeros using bash?
15
7
I have hard drive filled with zeroes.
How to check if all bits on hard drive are zeros using bash?
28
od
will replace runs of the same thing with *
, so you can easily use it to scan for nonzero bytes:
$ sudo od /dev/disk2 | head
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
234250000
8I'd add | head
to the end of that, so that if it turns out that the drive isn't zeroed, it stops after producing just enough output to show the fact, instead of dumping the entire drive to the screen. – Wyzard – 2013-03-02T19:30:56.267
2@Wyzard: Excellent idea; I'll add it to my answer. – Gordon Davisson – 2013-03-02T19:49:04.700
8
I've written a short C++ program to do so, source available here.
To build it:
wget -O iszero.cpp https://gist.github.com/BobVul/5070989/raw/2aba8075f8ccd7eb72a718be040bb6204f70404a/iszero.cpp
g++ -o iszero iszero.cpp
To run it:
dd if=/dev/sdX 2>/dev/null | ./iszero
It will output the position and value of any nonzero bytes. You can redirect this output to a file with >
, e.g.:
dd if=/dev/sdX 2>/dev/null | ./iszero >nonzerochars.txt
You might want to try changing BUFFER_SIZE
for better efficiency. I'm not sure what an optimum value might be. Note that this also affects how often it prints progress, which will affect speed somewhat (printing output to the console is slow). Add 2>/dev/null
to get rid of progress output.
I am aware this is not using standard bash, nor even builtins, but it should not require any extra privileges. @Hennes' solution is still faster (I haven't really optimised anything - this is the naïve solution); however, this little program can give you a better idea of just how many bytes your wiper has missed, and in what location. If you disable the progress output, it'll still be faster than most consumer hard drives can read (>150 MB/s), so that's not a big issue.
A faster version with less verbose output is available here. However, it is still a little slower than @Hennes' solution. This one, however, will quit on the first nonzero character it encounters so it is potentially much faster if there's a nonzero near the beginning of the stream.
Adding source to post to keep answer better self-contained:
#include <cstdio>
#define BUFFER_SIZE 1024
int main() {
FILE* file = stdin;
char buffer[BUFFER_SIZE];
long long bytes_read = 0;
long long progress = 0;
long long nonzero = 0;
while (bytes_read = fread(buffer, 1, BUFFER_SIZE, file)) {
for (long long i = 0; i < bytes_read; i++) {
progress++;
if (buffer[i] != 0) {
nonzero++;
printf("%lld: %x\n", progress, buffer[i]);
}
}
fprintf(stderr, "%lld bytes processed\r", progress);
}
fprintf(stderr, "\n");
int error = 0;
if (error = ferror(file)) {
fprintf(stderr, "Error reading file, code: %d\n", error);
return -1;
}
printf("%lld nonzero characters encountered.\n", nonzero);
return nonzero;
}
This is a great answer, but is there any way to make the script work more like a normal command - using iszero /dev/sda
rather than requiring it to be piped with something like iszero < /dev/sda
? – Hashim – 2019-12-01T22:28:02.980
1@Hashim This was written as more or less a throwaway program quite a while ago (nowadays I'd at least do it in a scripting language like Python rather than compiled C)... that said, if you wanted to take arguments in the most simple way, it'd be somewhere along the lines of making it int main(int argc, char *argv[])
and then FILE* file = fopen(argv[1], "r");
. Done properly it'd include checking if the argument actually exists, error checking successful open (do an additional ferror
check after the fopen
), etc., but too much trouble for a throwaway program. – Bob – 2019-12-02T03:19:08.297
That's fair enough. Is there any particular reason you would do it in Python rather than C? Surely C would be a lot faster? – Hashim – 2019-12-02T04:21:32.453
1@Hashim I suspect SIMD vectorised operations in numpy would be close to vectorised instructions in C. And that's assuming the C compiler is smart enough to vectorise the loop in the naive C program. Would have to benchmark to be sure; unfortunately I don't really have the time to do that right now. Main advantage of Python (et al.) is it's generally available and runnable without a compiler, while gcc
is not necessarily available on all Linux distros without pulling down additional packages. Then again numpy isn't part of standard Python packages either... – Bob – 2019-12-02T04:33:33.450
I have a working build environment and much prefer having exes that I can call instead of scripts. I'll carry on doing some basic tests with the fast version of this program and other solutions to find out which is faster for my needs. Thanks again for coding it. – Hashim – 2019-12-02T04:38:19.240
1@Hashim If you compile with -O3
and -march=native
you might see some speedups; that should make sure GCC enables auto-vectorisation and uses the best available for your current CPU (AVX, SSE2/SSE3, etc.). Along with that you can play with the buffer size; different buffer sizes may be more optimal with vectorised loops (I'd play with 1MB+, current one is 1kB). – Bob – 2019-12-02T04:46:33.980
1
@Hashim Above comment edited, in case you didn't see. Beyond that, if you'd like to discuss further, you can ping me (@Bob
) in chat: https://chat.stackexchange.com/rooms/118/root-access
6
Expanding on Gordon's answer, pv
provides an indication of how far along the process is:
$ sudo pv -tpreb /dev/sda | od | head
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
9.76GiB 0:06:30 [25.3MiB/s] [=================> ] 59% ETA 0:04:56
This is very useful with a big hard drive! – Martin Hansen – 2014-03-05T23:04:19.100
5
This seems an ugly inefficient solution, but if you have to check only once:
dd if=/dev/sdX | tr --squeeze-repeats "\000" "T"
Using dd to read from disk sdX
. (replace the X with the drive you want to read from),
Then translating all unprintable zero bytes to something we can handle.
Next we either count the bytes we can handle and check if it is the right number (use wc -c
for that), or we skip counting and use the -s
or --squeeze-repeats
to squeeze all multiple occurrences to a single char.
Thus dd if=/dev/sdX | tr --squeeze-repeats "\000" "T"
should print only a single T.
If you want to do this regularly then you want something more efficient.
If you want to do this only once then this kludge may verify that your normal wiper is working and that you can trust it.
Why do you consider this solution to be inefficient? Is there some buffering that requires reading far past the first non-NUL location? – Daniel Beck – 2013-03-02T12:55:11.617
Is there a potential problem where a literal 'T' is present in the steam as the only nonzero character? – Bob – 2013-03-02T12:57:30.097
True. That is a flaw in the design. I am also not using bash (the shell itself), but I assumed that with "Bash" you meant "Not from bash, from using any shell prompt and standard text mode tools". – Hennes – 2013-03-02T13:06:25.273
3@daniel: A simple C program should be able to read all data without changing every read byte. Which would be more efficient and aesthetically pleasing. It might also take much more time to write such a program than to just use available tools in an inefficient way. – Hennes – 2013-03-02T13:06:53.260
3
To check only, you'll see any blocks that don't match listed
sudo badblocks -sv -t 0x00 /dev/sdX
Or use badblocks to write them as well as check:
sudo badblocks -svw -t 0x00 /dev/sdX
The default destrucive test is my secure erase of choice
sudo badblocks -svw /dev/sdX
If anyone can retrieve anything after filling the drive with alternating 0s and 1s, then their complement, then all 1s, then all 0s, with every pass verified it worked, good luck to them!
Makes a good pre-deployment check on new drives too
man badblocks
for other options
Not saying it's fast, but it works...
2
Best of both worlds. This command will skip bad sectors:
sudo dd if=/dev/sdX conv=noerror,sync | od | head
Use kill -USR1 <pid of dd>
to see progress.
0
Wanted to post this clever solution from a similar but earlier question, posted by a user who hasn't logged in for a while:
There is a device
/dev/zero
on a Linux system that always gives zeroes when read.So, how about comparing your hard drive with this device:
cmp /dev/sdX /dev/zero
If all is well with zeroing out your hard drive it will terminate with:
cmp: EOF on /dev/sdb
telling you that the two files are the same until it got to the end of the hard drive. If there is a non-zero bit on the hard drive
cmp
will tell you where it is in the file.If you have the
pv
package installed then:pv /dev/sdX | cmp /dev/zero
will do the same thing with a progress bar to keep you amused while it checks your drive (the EOF will now be on STDIN rather than sdX though).
0
Some time ago I was curious about AIO
. The result was a sample test program which happens to check for sectors (512 byte blocks) which are NUL
. You can see this as a variant of a sparse file-regions detector. I think the source says it all.
NUL
output looks like 0000000000-eof
. Note that there is a trick in the program, function fin()
is not called at line 107 on purpose to give the shown output.AIO
is not as straight forward as other ways,AIO
is probably the fastest way to keep a drive busy reading, because the NUL
compare is done while the next data block is read in. (We could squeeze out a few more milliseconds by doing overlapping AIO
, but I really do not think this is worth the effort.)true
if the file is readable and everything worked. It does not return false
if the file is non-NUL
.NUL
it still works, as the memory buffers already contain NUL
. If somebody thinks this needs a fix, in line 95 the memcmp(nullblock, buf+off, SECTOR)
could read memcmp(nullblock, buf+off, len-off<SECTOR : len-off : SECTOR)
. But the only difference is, that the "end reporting" perhaps is a bit random (not for a file which is entirely NUL
).memcmp()
also fixes another issue on platforms, which do not NUL
alloc()
ed memory, because the code does not do it. But this only might be seen by files less than 4 MiB, but checknul
probably is plain overkill for such a small task ;)HTH
/* Output offset of NUL sector spans on disk/partition/file
*
* This uses an AIO recipe to speed up reading,
* so "processing" can take place while data is read into the buffers.
*
* usage: ./checknul device_or_file
*
* This Works is placed under the terms of the Copyright Less License,
* see file COPYRIGHT.CLL. USE AT OWN RISK, ABSOLUTELY NO WARRANTY.
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <malloc.h>
#include <aio.h>
#define SECTOR 512
#define SECTORS 40960
#define BUFFERLEN (SECTOR*SECTORS)
static void
oops(const char *s)
{
perror(s);
exit(1);
}
static void *
my_memalign(size_t len)
{
void *ptr;
static size_t pagesize;
if (!pagesize)
pagesize = sysconf(_SC_PAGESIZE);
if (len%pagesize)
oops("alignment?");
ptr = memalign(pagesize, len);
if (!ptr)
oops("OOM");
return ptr;
}
static struct aiocb aio;
static void
my_aio_read(void *buf)
{
int ret;
aio.aio_buf = buf;
ret = aio_read(&aio);
if (ret<0)
oops("aio_read");
}
static int
my_aio_wait(void)
{
const struct aiocb *cb;
int ret;
cb = &aio;
ret = aio_suspend(&cb, 1, NULL);
if (ret<0)
oops("aio_suspend");
if (aio_error(&aio))
return -1;
return aio_return(&aio);
}
static unsigned long long nul_last;
static int nul_was;
static void
fin(void)
{
if (!nul_was)
return;
printf("%010llx\n", nul_last);
fflush(stdout);
nul_was = 0;
}
static void
checknul(unsigned long long pos, unsigned char *buf, int len)
{
static unsigned char nullblock[SECTOR];
int off;
for (off=0; off<len; off+=SECTOR)
if (memcmp(nullblock, buf+off, SECTOR))
fin();
else
{
if (!nul_was)
{
printf("%010llx-", pos+off);
fflush(stdout);
nul_was = 1;
}
nul_last = pos+off+SECTOR-1;
}
}
int
main(int argc, char **argv)
{
unsigned char *buf[2];
int fd;
int io, got;
buf[0] = my_memalign(BUFFERLEN);
buf[1] = my_memalign(BUFFERLEN);
if (argc!=2)
oops("Usage: checknul file");
if ((fd=open(argv[1], O_RDONLY))<0)
oops(argv[1]);
aio.aio_nbytes = BUFFERLEN;
aio.aio_fildes = fd;
aio.aio_offset = 0;
io = 0;
my_aio_read(buf[io]);
while ((got=my_aio_wait())>0)
{
unsigned long long pos;
pos = aio.aio_offset;
aio.aio_offset += got;
my_aio_read(buf[1-io]);
checknul(pos, buf[io], got);
io = 1-io;
}
if (got<0)
oops("read error");
printf("eof\n");
close(fd);
return 0;
}
Would it be acceptable to just overwrite the entire drive with zeroes? Or do you actually need to confirm the current contents? – Bob – 2013-03-02T11:29:34.080
I want to verify that hard drive is filled with zeros. – gkfvbnhjh2 – 2013-03-02T11:51:37.750
1In theory there could be bug in data sanitization tools that lefts some data intact. I wan't to be sure that every bit is zero.
So how do I check if hdd is full of zeroes? – gkfvbnhjh2 – 2013-03-02T12:05:03.470
Why zeroes? Would you not randomly write zeros and 1s, several times? – None – 2013-03-02T14:22:38.627
13Because 1s are narrower than 0s - you can see the old data between them more easily. – ChrisA – 2013-03-02T16:01:17.667
@ChrisA Then why not fill it with ♥? They're filled in, as well ;) – Izkata – 2013-03-02T17:37:36.280