4
If I use the extreme byte size of 1 gb vs 1 mb for instance, on the hardware level, what is the difference? What does a byte size do if a constant stream of data is just getting written to the disk>?
4
If I use the extreme byte size of 1 gb vs 1 mb for instance, on the hardware level, what is the difference? What does a byte size do if a constant stream of data is just getting written to the disk>?
7
The difference is in how much data is read/written at once.
Programs rarely work with true constant streams – they usually read and write data in pieces: read 1 kB from input, write 1 kB to output, read 1 kB from input, write... So the bs=
(block size) parameter for dd tells it how much data to read at once. It's often more efficient to ask the OS to read one 2 MB piece from disk, than to read sixteen 128 kB pieces. (Partly because the OS needs to do less work.) On the other hand, if you try to use bs=16G
, then it will try to read a 16-gigabyte-sized block into RAM, then write it out at once; you'll probably run out of RAM.
It also depends on the storage media. Disks are not streams and aren't byte-addressed, whether magnetic or flash-based – they only allow reading block-sized pieces (where the block size depends on the hardware; many disks have it as 512 bytes or 4 kB). If the OS tries to read 100 bytes, the disk has to read the whole block anyway, and then discard the data. (It's even worse when writing: not sure with magnetic disks, but at least with flash, if you write 100 bytes the disk has to read the entire block, update it in memory, then write it.) So the block size used by dd
matters here too – it'll be much faster if the block size used by dd
is an exact multiple of the disk's block size. (Assuming there aren't any problems with partition alignment, that is.)
3'Block size', not 'byte size'. – Aaron Miller – 2013-04-29T17:27:37.697