Good block size for disk-cloning with diskdump (dd)

48

25

I use dd in its simplest form to clone a hard drive:

dd if=INPUT of=OUTPUT

However, I read in the manpage that dd knows a blocksize parameter. Is there an optimal value for the blocksize parameter that will speed up the cloning procedure?

Phi

Posted 2011-01-17T09:44:34.790

Reputation: 603

Answers

32

64k seems to be a good pick:

Results:

  no bs=        78s     144584+0 records
  bs=512        78s     144584+0 records
  bs=1k         38s     72292+0 records
  bs=2k         38s     36146+0 records
  bs=4k         38s     18073+0 records
  bs=5k         39s     14458+1 records
  bs=50k        38s     1445+1 records
  bs=500k       39s     144+1 records
  bs=512k       39s     144+1 records
  bs=1M         39s     72+1 records
  bs=5M         39s     14+1 records
  bs=10M        39s     7+1 records

(taken from here).

this matches with my own findings regarding read/write buffering for speeding up an io-heavy converter-program i was once pimping @work.

akira

Posted 2011-01-17T09:44:34.790

Reputation: 52 754

Please note that this benchmark might look different for rotating drives and ssds. – Jiri – 2015-11-26T16:05:40.497

4-1 This is almost completely dependant on your hard drive. Rather describe the procedure used to obtain these values so that the OP can repeat the steps to get the optimal block size for his own hard drive. Also, you haven't listed 64k in your list of results and all of the results past 1k are more or less the same. – Micheal Johnson – 2016-04-19T15:01:21.530

@MichealJohnson feel free to edit this post and take the description of how that table was generated from the link provided and paste it here. 64k is the first value that seems to yield no further improvement in terms of speed AND is a natural alignment. and yeah, it obvious that the measured speed depends completely upon the used hardware. this was true 5 years ago and it is true now. – akira – 2016-04-19T18:56:45.107

1Why 64k? To me 2k doesn't yield any further improvement and so 1k is the best value, and is also as natural an alignment as 64k. – Micheal Johnson – 2016-04-20T13:09:54.050

Does block size changes performance of SD card or only cuts size of moving file using dd to sdcard? – Trismegistos – 2016-05-28T18:38:34.570

Would it matter if I'm copying a iso or a whole drive? – VaTo – 2016-07-01T19:02:31.873

23

dd will happily copy using the BS of whatever you want, and will copy a partial block (at the end).

Basically, the block size (bs) parameter seems to set the amount of memory thats used to read in a lump from one disk before trying to write that lump to the other.

If you have lots of RAM, then making the BS large (but entirely contained in RAM) means that the I/O sub-system is utilised as much as possible by doing massively large reads and writes - exploiting the RAM. Making the BS small means that the I/O overhead as a proportion of total activity goes up.

Of course in this there is a law of diminishing returns. My rough approximation is that a block size in the range about 128K to 32M is probably going to give performance such that the overheads are small compared to the plain I/O, and going larger won't make a lot of difference. The reason for the lower bound being 128K to 32M is - it depends on your OS, hardware, and so on.

If it were me, I'd do a few experiments timing a copy/clone using a BS of 128K and again using (say) 16M. If one is appreciably faster, use it. If not, then use the smaller BS of the two.

quickly_now

Posted 2011-01-17T09:44:34.790

Reputation: 1 797

11

For those that end up here via Google, even if this discussion is a bit old...

Keep in mind that dd is dumb for a reason: the simpler it is, the fewer ways it can screw up.

Complex partitioning schemes (consider a dual-boot hard drive that additionally uses LVM for its Linux system) will start pulling bugs out of the woodwork in programs like Clonezilla. Badly-unmounted filesystems can blow ntfsclone sky-high.

A corrupt filesystem cloned sector-by-sector is no worse than the original. A corrupt filesystem after a failed "smart copy" may be in REALLY sorry shape.

When in doubt, use dd and go forensic. Forensic imaging requires sector-by-sector copies (in fact, it can require more sectors than you're going to be able to pull off with dd, but that's a long story). It is slow and tedious but it will get the job done correctly.

Also, get to know the "conv=noerror,sync" options, so that you can clone drives that are starting to fail-- or make ISOs from scratched (cough) CDs-- without it taking months.

Matt Heck

Posted 2011-01-17T09:44:34.790

Reputation: 209

What does the sync option do? The man page just says: "use synchronized I/O for data and metadata". What are we synchronizing with? That can be many different things. – sherrellbc – 2016-01-11T15:08:26.573

1@sherrellbc sync fills input blocks with zeroes if there were any read errors, so data offsets stay in sync. – goetzc – 2016-11-13T13:49:52.407

9

As others have said, there is no universally correct block size; what is optimal for one situation or one piece of hardware may be terribly inefficient for another. Also, depending on the health of the disks it may be preferable to use a different block size than what is "optimal".

One thing that is pretty reliable on modern hardware is that the default block size of 512 bytes tends to be almost an order of magnitude slower than a more optimal alternative. When in doubt, I've found that 64K is a pretty solid modern default. Though 64K usually isn't THE optimal block size, in my experience it tends to be a lot more efficient than the default. 64K also has a pretty solid history of being reliably performant: You can find a message from the Eug-Lug mailing list, circa 2002, recommending a block size of 64K here: http://www.mail-archive.com/eug-lug@efn.org/msg12073.html

For determining THE optimal output block size, I've written the following script that tests writing a 128M test file with dd at a range of different block sizes, from the default of 512 bytes to a maximum of 64M. Be warned, this script uses dd internally, so use with caution.

dd_obs_test.sh:

#!/bin/bash

# Since we're dealing with dd, abort if any errors occur
set -e

TEST_FILE=${1:-dd_obs_testfile}
TEST_FILE_EXISTS=0
if [ -e "$TEST_FILE" ]; then TEST_FILE_EXISTS=1; fi
TEST_FILE_SIZE=134217728

if [ $EUID -ne 0 ]; then
  echo "NOTE: Kernel cache will not be cleared between tests without sudo. This will likely cause inaccurate results." 1>&2
fi

# Header
PRINTF_FORMAT="%8s : %s\n"
printf "$PRINTF_FORMAT" 'block size' 'transfer rate'

# Block sizes of 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
for BLOCK_SIZE in 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864
do
  # Calculate number of segments required to copy
  COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE))

  if [ $COUNT -le 0 ]; then
    echo "Block size of $BLOCK_SIZE estimated to require $COUNT blocks, aborting further tests."
    break
  fi

  # Clear kernel cache to ensure more accurate test
  [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches

  # Create a test file with the specified block size
  DD_RESULT=$(dd if=/dev/zero of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync 2>&1 1>/dev/null)

  # Extract the transfer rate from dd's STDERR output
  TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?')

  # Clean up the test file if we created one
  if [ $TEST_FILE_EXISTS -ne 0 ]; then rm $TEST_FILE; fi

  # Output the result
  printf "$PRINTF_FORMAT" "$BLOCK_SIZE" "$TRANSFER_RATE"
done

View on GitHub

I've only tested this script on a Debian (Ubuntu) system and on OSX Yosemite, so it will probably take some tweaking to make work on other Unix flavors.

By default the command will create a test file named dd_obs_testfile in the current directory. Alternatively, you can provide a path to a custom test file by providing a path after the script name:

$ ./dd_obs_test.sh /path/to/disk/test_file

The output of the script is a list of the tested block sizes and their respective transfer rates like so:

$ ./dd_obs_test.sh
block size : transfer rate
       512 : 11.3 MB/s
      1024 : 22.1 MB/s
      2048 : 42.3 MB/s
      4096 : 75.2 MB/s
      8192 : 90.7 MB/s
     16384 : 101 MB/s
     32768 : 104 MB/s
     65536 : 108 MB/s
    131072 : 113 MB/s
    262144 : 112 MB/s
    524288 : 133 MB/s
   1048576 : 125 MB/s
   2097152 : 113 MB/s
   4194304 : 106 MB/s
   8388608 : 107 MB/s
  16777216 : 110 MB/s
  33554432 : 119 MB/s
  67108864 : 134 MB/s

(Note: The unit of the transfer rates will vary by OS)

To test optimal read block size, you could use more or less the same process, but instead of reading from /dev/zero and writing to the disk, you'd read from the disk and write to /dev/null. A script to do this might look like so:

dd_ibs_test.sh:

#!/bin/bash

# Since we're dealing with dd, abort if any errors occur
set -e

TEST_FILE=${1:-dd_ibs_testfile}
if [ -e "$TEST_FILE" ]; then TEST_FILE_EXISTS=$?; fi
TEST_FILE_SIZE=134217728

# Exit if file exists
if [ -e $TEST_FILE ]; then
  echo "Test file $TEST_FILE exists, aborting."
  exit 1
fi
TEST_FILE_EXISTS=1

if [ $EUID -ne 0 ]; then
  echo "NOTE: Kernel cache will not be cleared between tests without sudo. This will likely cause inaccurate results." 1>&2
fi

# Create test file
echo 'Generating test file...'
BLOCK_SIZE=65536
COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE))
dd if=/dev/urandom of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync > /dev/null 2>&1

# Header
PRINTF_FORMAT="%8s : %s\n"
printf "$PRINTF_FORMAT" 'block size' 'transfer rate'

# Block sizes of 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
for BLOCK_SIZE in 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864
do
  # Clear kernel cache to ensure more accurate test
  [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches

  # Read test file out to /dev/null with specified block size
  DD_RESULT=$(dd if=$TEST_FILE of=/dev/null bs=$BLOCK_SIZE 2>&1 1>/dev/null)

  # Extract transfer rate
  TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?')

  printf "$PRINTF_FORMAT" "$BLOCK_SIZE" "$TRANSFER_RATE"
done

# Clean up the test file if we created one
if [ $TEST_FILE_EXISTS -ne 0 ]; then rm $TEST_FILE; fi

View on GitHub

An important difference in this case is that the test file is a file that is written by the script. Do not point this command at an existing file or the existing file will be overwritten with random data!

For my particular hardware I found that 128K was the most optimal input block size on a HDD and 32K was most optimal on a SSD.

Though this answer covers most of my findings, I've run into this situation enough times that I wrote a blog post about it: http://blog.tdg5.com/tuning-dd-block-size/ You can find more specifics on the tests I performed there.

This StackOverflow post may also be helpful: dd: How to calculate optimal blocksize?

tdg5

Posted 2011-01-17T09:44:34.790

Reputation: 286

3

Yes, but you won't find it without lots of testing. I've found that 32M is a good value to use though.

Ignacio Vazquez-Abrams

Posted 2011-01-17T09:44:34.790

Reputation: 100 516

1

cloning old boot drive to new ssd on external sata (ssd to ssd)

  • using linux Ubuntu 18.04.2 LTS 64bit
  • hp xw4600 ( 8GB RAM, intel Core 2 Quad Q6700 @2.66GHz 4c/4t no-HT)

using Disks (tool) > format > ATA Secure Erase (2min)

$ lsblk -l /dev/sd?
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda    8:0    0 119,2G  0 disk 
sda1   8:1    0 119,2G  0 part /
sdb    8:16   0   2,7T  0 disk 
sdc    8:32   0   2,7T  0 disk 
sdd    8:48   0  12,8T  0 disk 
sde    8:64   0   2,7T  0 disk
sdf    8:80   1 465,8G  0 disk 

$ sudo fdisk -l /dev/sda
Disk /dev/sda: 119,2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

$ sudo fdisk -l /dev/sdf
Disk /dev/sdf: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
  • sda: Kingston SSD (old; Disks reports an average rd rate 263 MB/s with peaks close to 270 MB/s -- no write test due to system disk)
  • sdf: Crucial MX500, 500GB, CT500MX500SSD1 (Disks reports: average rd/wr rate 284/262 MB/s, and access time 0.05ms, with peaks at about 290/270 MB/s)

Test runs:

$ sudo dd if=/dev/sda of=/dev/sdf
250069680+0 records in
250069680+0 records out
128035676160 bytes (128 GB, 119 GiB) copied, 3391,72 s, 37,7 MB/s
#       --vvvvv--                            *********
$ sudo dd bs=1M if=/dev/sda of=/dev/sdf
122104+1 records in
122104+1 records out
128035676160 bytes (128 GB, 119 GiB) copied, 473,186 s, 271 MB/s
#                                            *********  ********

second try after secure erase with same result:

128035676160 bytes (128 GB, 119 GiB) copied, 472,797 s, 271 MB/s

kgSW.de

Posted 2011-01-17T09:44:34.790

Reputation: 11

Welcome to Super User! Thank you for your answer, but I'd suggest you [edit] it to include the summary; amongst all the quoted output, I found it tricky to find what your actual answer is! Cheers – bertieb – 2019-04-03T00:19:48.560