dd

dd is a core utility whose primary purpose is to copy a file and optionally convert it during the copy process.

Similarly to cp, by default dd makes a bit-to-bit copy of the file, but with lower-level I/O flow control features.

For more information, see dd(1) or the full documentation.

Tip: By default, dd outputs nothing until the task has finished. To monitor the progress of the operation, add the status=progress option to the command.
Warning: One should be extremely cautious using dd, as with any command of this kind it can destroy data irreversibly.

Installation

dd is part of the GNU coreutils. For other utilities in the package, please refer to Core utilities.

Disk cloning and restore

The dd command is a simple, yet versatile and powerful tool. It can be used to copy from source to destination, block-by-block, regardless of their filesystem types or operating systems. A convenient method is to use dd from a live environment, as in a Live CD.

Cloning a partition

From physical disk , partition 1, to physical disk , partition 1:

# dd if=/dev/sda1 of=/dev/sdb1 bs=64K conv=noerror,sync status=progress

Cloning an entire hard disk

From physical disk to physical disk :

# dd if=/dev/sda of=/dev/sdb bs=64K conv=noerror,sync status=progress

This will clone the entire drive, including the partition table, bootloader, all partitions, UUIDs, and data.

  • bs= sets the block size. Defaults to 512 bytes, which is the "classic" block size for hard drives since the early 1980s, but is not the most convenient. Use a bigger value, 64K or 128K. Also, please read the note below, because there is more to this than just "block sizes" — it also influences how read errors propagate. See and for details and to figure out the best bs value for your use case.
  • noerror instructs dd to continue operation, ignoring all read errors. Default behavior for dd is to halt at any error.
  • fills input blocks with zeroes at the end of the block if there were any read errors somewhere in the block, so data offsets stay in sync (see below for detailed explanation of read error behavior with sync if you are suspecting possible read errors).
  • shows periodic transfer statistics which can be used to estimate when the operation may be complete.

The dd utility technically has an "input block size" (IBS) and an "output block size" (OBS). When you set , you effectively set both IBS and OBS. Normally, if your block size is, say, 1 MiB, dd will read 1024×1024 bytes and write as many bytes. But if a read error occurs, things will go wrong. Many people seem to think that dd will "fill up read errors with zeroes" if you use the options, but this is not what happens. dd will, according to documentation, fill up the OBS to IBS size after completing its read, which means adding zeroes at the end of the block. This means, for a disk, that effectively the whole 1 MiB would become messed up because of a single 512 byte read error in the beginning of the read: 12ERROR89 would become 128900000 instead of 120000089.

If you are positive that your disk does not contain any errors, you could proceed using a larger block size, which will increase the speed of your copying several fold. For example, changing bs from 512 to 64K changed copying speed from 35 MB/s to 120 MB/s on a simple Celeron 2.7 GHz system. But keep in mind that read errors on the source disk will end up as block errors on the destination disk, i.e. a single 512-byte read error will mess up the whole 64 KiB output block.

Backing up the partition table

See fdisk#Backup and restore partition table or gdisk#Backup and restore partition table.

Create disk image

Boot from a live medium and make sure no partitions are mounted from the source hard drive.

Then mount the external hard drive and backup the drive:

# dd if=/dev/sda conv=sync,noerror bs=64K | gzip -c  > /path/to/backup.img.gz

If necessary (e.g. when the resulting files will be stored on a FAT32 file system), split the disk image into multiple parts (see also ):

# dd if=/dev/sda conv=sync,noerror bs=64K | gzip -c | split -a3 -b2G - /path/to/backup.img.gz

If there is not enough disk space locally, you may send the image through ssh:

# dd if=/dev/sda conv=sync,noerror bs=64K | gzip -c | ssh user@local dd of=backup.img.gz

Finally, save extra information about the drive geometry necessary in order to interpret the partition table stored within the image. The most important of which is the cylinder size.

# fdisk -l /dev/sda > /path/to/list_fdisk.info
Tip: gzip is only able to compress data using a single CPU core, which leads to a data throughput considerably lower than the write speeds on modern storage. In order to leverage multicore compression and create a disk image more quickly, one could for instance install the pigz package, and simply replace the gzip -c command above with pigz -c. For large disks, this can potentially save hours. You can also try other compression algorithms such as zstd.

Restore system

To restore your system:

# gunzip -c /path/to/backup.img.gz | dd of=/dev/sda

When the image has been split, use the following instead:

# cat /path/to/backup.img.gz* | gunzip -c | dd of=/dev/sda

Backup and restore MBR

Before making changes to a disk, you may want to backup the partition table and partition scheme of the drive. You can also use a backup to copy the same partition layout to numerous drives.

The MBR is stored in the the first 512 bytes of the disk. It consists of 4 parts:

  1. The first 440 bytes contain the bootstrap code (boot loader).
  2. The next 6 bytes contain the disk signature.
  3. The next 64 bytes contain the partition table (4 entries of 16 bytes each, one entry for each primary partition).
  4. The last 2 bytes contain a boot signature.

To save the MBR as :

# dd if=/dev/sdX of=/path/to/mbr_file.img bs=512 count=1

You can also extract the MBR from a full dd disk image:

# dd if=/path/to/disk.img of=/path/to/mbr_file.img bs=512 count=1

To restore (be careful, this destroys the existing partition table and with it access to all data on the disk):

# dd if=/path/to/mbr_file.img of=/dev/sdX bs=512 count=1

If you only want to restore the boot loader, but not the primary partition table entries, just restore the first 440 bytes of the MBR:

# dd if=/path/to/mbr_file.img of=/dev/sdX bs=440 count=1

To restore only the partition table, one must use:

# dd if=/path/to/mbr_file.img of=/dev/sdX bs=1 skip=446 count=64

Remove bootloader

To erase the MBR bootstrap code (may be useful if you have to do a full reinstall of another operating system), only the first 440 bytes need to be zeroed:

# dd if=/dev/zero of=/dev/sdX bs=440 count=1

As some readers might have already realised, the dd(1) core utility has a different command-line syntax compared to other utilities. Moreover, while supporting some unique features not found in other commodity utilities, several default behaviours it has are either less-ideal or potentially error-prone if applied to specific scenarios. For that reason, users may want to use some alternatives that are better in some aspects in lieu of the dd core utility.

That said, it is still worth to note that since dd is a core utility, which is installed by default on Arch and many other systems, it is preferable to some alternatives or more specialised utilities if it is inconvenient to install a new package on your system.

To cover the two aspects that are addressed above, this section is dedicated to summarising the features of the dd(1) core utility that are rarely found in other commodity utilities — in a form that resembles the pacman/Rosetta article but with the quantity of examples being cut down to examine the features of dd (as denoted by i.e. or To-clause in "Tip:" box under subsection), either in practice or pseudocode.

Patching a binary file, block-by-block in-place

It is not an uncommon practice to use dd as a feature-limited binary file patcher in an automated shell script as it supports:

  • ing the output file by a given offset before writing.
  • writing to an output file (without truncating the size of the output file by adding the conv=notrunc option).

Here is an example to modify the timestamp part of the first member in a archive, which starts at the 49th byte of the file (or with an offset of if you prefer hex notation):

$ touch a-randomly-chosen-file
$ bsdtar -cf example-modify-ts.cpio --format odc -- a-randomly-chosen-file
$ printf '%011o' "$(date -d "2019-12-21 00:00:00" +%s)" | dd conv=notrunc of=example-modify-ts.cpio seek=48 oflag=seek_bytes
Tip: To print byte stream from command-line input hex notation, use basenc(1) §base16 and/or printf(1).

Printing volume label of a VFAT filesystem image

To read the filesystem volume label of an VFAT image file, which should be in total length of 11 bytes that padded by ASCII spaces, with an offset of :

$ truncate -s 33M empty-hole.img
$ mkfs.vfat -F32 -n LabelMe empty-hole.img
$ dd iflag=skip_bytes,count_bytes count=11 skip=$((0x047)) if=empty-hole.img | sed -e 's% *$%%'

Sponge between piped commands

In the following example, to avoid unnecessary long-lasting TCP connection on input end if the output end blocks longer than expected, one may put a dd between two commands with an output block size certainly larger than input while still reasonably smaller than available memory:

$ curl -qgsfL http://example.org/mirrors/ftp.archlinux.org/mirrored.md5deep | dd ibs=128k obs=200M | poor-mirroring-script-that-perform-mirroring-on-input-paths-line-by-line-wo-buffer-entire-list-first

Transfering data with size limitation

It is a general practice to use dd in a data streaming shell script for limiting total length of data that a piped command may consume. For example, to inspect an ustar header block () using a shell script function in a streaming manner:

Note: The B suffix in argument to count option is a newerly introduced feature as of GNU coreutils v9.1 that has same effect of count_bytes input flag[broken link: invalid section], is potentially confusable with option in forms like count=256k which indicate dd to copy 262144 input blocks instead of bytes.
$ bsdtar -cf - /dev/tty /dev/null 2>&- | dd count=1 skip=1 status=none | inspect-tar-header-block
Tip: To streaming data from input to output within given length, an alternative is pv(1) §S, which supports splice(2) system call.
Note: Another candidate alternative is head(1) §c, though implementation other than GNU coreutils and glibc may consume more data than requested, causing data misalignment issue in a streamingly shell script.

Writing a bootable disk image to block device, optionally display progress information

See USB flash installation medium#Using basic command line utilities for examples of commodity utilities include the potential least adapted dd for that case.

Troubleshooting

Partial read: copied data is smaller than requested

Files created with dd can end up with a smaller size than requested if a full input block is not available for the moment, as per documentaion:

In addition, if no data-transforming operand (i.e. option as in this wiki article) is specified, input is copied to the output as soon as it is read, even if it is smaller than the block size.

On Linux, the underlying system call may returns early (i.e. partial read) when reading from a , or when reading a device file like and (e.g. due to hardcoded limitation of underlying kernel device driver or insufficient entropy.) Which make the total size of copied data smaller than expected when in conjunction of option is used, where n limited the maximum number of (potential partial) input block(s) to copy to output.

It is possible, but not guaranteed, that dd will warn you about such kind of issue:

dd: warning: partial read (X bytes); suggest iflag=fullblock

The solution is to do as the warning says, add option in addition to the input file option to the dd command. For example, to create a new file filled up with random data in total length of 40 megabytes:

$ dd if=/dev/urandom of=new-file-filled-by-urandom.bin bs=40M count=1 iflag=fullblock
Note: When reading from a pipe or a special device file like we just mentioned below to copy a portion of file in a fixed length with count=n option being specified, it is suggested to, or always strongly recommended to add the iflag=fullblock option to the dd command if in case of wiping a portion of device or file.

When reading from a pipe, an alternative to is to limit to the constant value as defined in to make the I/O atomic. For example, to prepare a text file filled up will random alphanumeric string in total length of 5 megabytes:

$ LC_ALL=C tr -dc '[:alnum:]' </dev/urandom | dd of=passtext-5m.txt bs=4k count=1280

Since the output file is not a pipe, one may prefer to use and options to set block size separately for the (input) pipe and the (output) on-disk file. For example, to set a more efficient block size for output file:

$ LC_ALL=C tr -dc '[:alnum:]' </dev/urandom | dd of=passtext-5m.txt ibs=4k obs=64k count=1280

Total transfered bytes count readout is wrong

The total transferred bytes count readout may be greater than actual if an error is encountered on writing to output (i.e. partial write, caused by e.g. SIGPIPE, faulty medium, or accidentally disconnected the target network block device), like in following proof of concept where the second dd obviously will not read more than 512200 bytes, but the first dd instance still report an inaccurate bytes count 512400 bytes:

$ yes 'x' | dd bs=4096 count=512400B | dd ibs=1 count=512200 status=none >/dev/null
125+1 records in
125+1 records out
512400 bytes (512 kB, 500 KiB) copied, 10.7137 s, 47.8 kB/s

When resuming an interrupted transfer like the above PoC, it is recommended to only rely on the readout of number of whole output blocks already copied, as denoted by the number before "+" sign.

gollark: Buy PotatoBIOS-OC, now with advanced backdoors.
gollark: Pier pressure: being pushed off a pier by peers.
gollark: Why get two kristnames when you could get one kristnameses?
gollark: Mine have Gsys.
gollark: S K A M

See also

  • : POSIX specification of dd core utility in manpage form
This article is issued from Archlinux. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.