Why does 'dd' produce different files for same USB stick?

4

1

I've just saved the restore for Windows 8.1 onto a USB stick; now, I've been creating the low level copy of it on my HDD, by executing the following command:

sudo dd if=/dev/sdf of=/disk2/Archive/windows8.1-restore.img bs=4M oflag=direct

I wanted to double check that my 'dd' command was ok, so I've rerun it two times, specifying both bs=8M and bs=16M; I've checked the size and it's exactly the same, but md5sum gives a different output for the three files:

c38a2b07b3d473d3f1876331edc2647b  windows8.1-restore.img.4M
568e382844431eef63d4ba6dc4c2c5ac  windows8.1-restore.img.8M
568e382844431eef63d4ba6dc4c2c5ac  windows8.1-restore.img.16M

I believe I have unmounted the USB stick the second and third time.

Should I be worried about anything?

edit

Total file size is 31024349184 bytes in all cases, my understanding of bs=xxx is to just control the speed in case one wants to dump the whole USB sitck/drive.

Emanuele

Posted 2015-02-01T11:29:09.953

Reputation: 661

3Was the stick unmounted when you ran dd with bs=4M? – gronostaj – 2015-02-01T11:38:39.243

Nope. I guess I should have, right? – Emanuele – 2015-02-01T11:39:14.080

3Right, you should have it be unmounted. (Or, maybe mounted read-only.) I'm also not sure where you're getting your block sizes from. I would use bs=512. I pretty much always use bs=512 except when using a CD drive because they may want a different block size (like bs=2048 , or perhaps bs=2352 or whatever block size is being used, as noted by <A HREF="http://www.osta.org/technology/cdqa7.htm">CD block sizes</A>). – TOOGAM – 2015-02-01T12:10:19.067

@TOOGAM Here's how to create links.

– gronostaj – 2015-02-01T12:19:00.670

@TOOGAM Isn't bs=xxx just determining the speed only? At the end of the day it shouldn't make any difference, right? I expect that if the last block is less than xxx, to be truncated automagically. Am I wrong? – Emanuele – 2015-02-01T12:51:04.677

3@Emanuele I don't think you are wrong. Small block sizes with dd are known to kill performance because it forces many more read and write calls than would otherwise be necessary. I believe gronostaj is correct, your problem is that you dd'd the disk with the file system mounted. Assuming you haven't remounted the file system since, you should be able to verify this by re-running your initial dd command; you should see an identical MD5sum from that invocation as well. – a CVn – 2015-02-01T13:16:07.590

Answers

8

Writing small amounts of data to a drive is slow, so system buffers writes to commit them all at once later. When the buffer contains enough data for an efficient write operation or when some process uses sync system call, the buffer is flushed to device.

dd performs a low-level copy, ie. it reads data that is physically present on a device. It doesn't take buffers into account.

If the drive was mounted when you ran dd bs=4M, then it's possible that some writes were already buffered, but not commited. You have dumped the drive without buffered changes.

umount calls sync internally to ensure data integrity. Unmounted devices usually aren't accessed unless you explicitly ask some process to do it, so the drive was unlikely to change after unmounting.

Then you ran dd twice on the drive without mounting it in-between. That's why bs=8M and bs=16M calls produced same results.

Drive was modified between bs=4M and bs=8M calls, though, so the first dump is different. bs= didn't matter, calling umount did.

You should always unmount device before using dd on it, otherwise some other process may modify its contents while dd is doing its job and break file integrity.

gronostaj

Posted 2015-02-01T11:29:09.953

Reputation: 33 047

2I thought I'd been a bit of an idiot in having it mounted the first time around when I executed dd. I guess I'll keep the file taken with the device unmounted. Scary! – Emanuele – 2015-02-01T11:57:20.313