Proper, if any, way to use dd as pipe buffer?

4

The Question

When I looked for pipe buffering tools in *NIX, I see suggestions of using buffer, mbuffer, or pv. However, the former two are not always in distros' official repo (such as Arch), while pv (as of 1.6.0) has a bug which prevents this functionality. In a few other questions, I see mentions about dd used as buffers, and I would like to explore it because dd is always there. However, none are elaborate enough to make real sense, so here I ask for a "proper" way to use it.

Questions mentioned dd include https://unix.stackexchange.com/questions/345072/can-dd-be-used-to-add-a-buffer-to-a-pipe and https://unix.stackexchange.com/questions/21918/utility-to-buffer-an-unbounded-amount-of-data-in-a-pipeline

For ease of testing, I provide a test script below, with some comments about my own experiments. Details will be explained after the code listing. Please make sure you have pv installed and at least 256M memory before running!

#!/bin/sh

producer() {
    while [ 1 ]; do
    dd if=/dev/zero bs=64M count=1 iflag=fullblock status=none
    sleep 4
    done
}

buffer() {
    # Works, but long
    # Must at least fill 32M before consumer starts
    # So, must choose small obs and chain more to look
    # more like a proper "buffer"
    dd obs=32M status=none | \
        dd obs=32M status=none| \
        dd obs=32M status=none| \
        dd obs=32M status=none
    # Doesn't work, producer rate limited
    #dd bs=128M status=none 
    # Doesn't work, producer must fill buffer, then
    # must wait until buffer is empty
    #dd obs=128M status=none 
    # Doesn't work, producer rate limited
    #dd ibs=128M status=none 
    # Doesn't work, producer must fill buffer, then
    # must wait until buffer is empty
    #dd bs=128M status=none iflag=fullblock
}

consumer() {
    pv --rate-limit 1M -q | dd of=/dev/null status=none
}

producer | pv -cN produce | buffer | pv -cN consume | consumer

Here, the producer produces 64MB of data every 4 seconds, with a 128MB buffer, while the consumer consumes at a constant 1MB/s rate. Of course, this means that buffer will overflow pretty quick, but this is to show effects clearly. Ideally, before buffer fills up (at the third production), we should see a constant 1MB/s consumption, and bursts of production giving 64MB data each. The "correct" output looks like this:

  produce:  128MiB 0:00:07 [   0 B/s] [  <=>                                                       ]
  consume: 7.25MiB 0:00:07 [1.01MiB/s] [       <=>                                                 ]

Here, the working solution is shown as following:

dd obs=32M status=none | \
    dd obs=32M status=none| \
    dd obs=32M status=none| \
    dd obs=32M status=none

This is constructed by splitting the required 128MB buffer into 4 chunks. Yes, each chunk must fill up before data is passed to next level, but since 32MB is smaller than the 64MB burst, it works for this test, as if it is a real buffer. Now, there are some problems.

  1. In real applications, we don't have an instantaneous burst of data, so chunks need to be small, but not too small. Which means there will be a long chain of dd commands
  2. What if an EOF is encountered, before 32MB mark is reached? Will that block be lost? I tested with dd if=test.txt| dd obs=1M | dd obs=1M | dd of=test2.txt and compared result. Turns out that this is not a problem. So, using it for backup won't corrupt data.
  3. How much overhead does it create?
  4. Is there a more elegant way to achieve the same, by cleverly arranging parameters?

There are a few other attempts included in the script, and they don't work, as explained in comments. And I have tried using FIFO + background processes, which yields the same result.

PS. Just so you know, buffering a pipe is quite useful when backing up A into B, especially when A is a HDD, which has seek time. So I would do something like this:

tar cSpf - <path> -C <root path> | <a large buffer> | <some parallel compressor> \
| <small buffer if compressor is generally slow and B have seek time> \
| dd bs=<several GB if B is not an SSD> iflag=fullblock oflag=direct of=<archive.tar.??>

Carl Dong

Posted 2017-05-04T20:58:31.150

Reputation: 141

To chain identical commands you can use recursive function, e.g. buf() { if [ "$2" -gt 0 ] ; then dd status=none obs="$1" | buf "$1" $(($2-1)) ; else cat ; fi ; }. Usage: producer | buf 32M 4 | consumer. – Kamil Maciorowski – 2017-05-05T07:05:05.993

That sounds better, definitely. – Carl Dong – 2017-05-05T15:48:27.053

Answers

0

I am putting in my own answer. It might not be the best, but it is OK.

Caution

This is written in front after many tests.

Do not chain too many DD's for buffering, or all your CPU cores may block on IO, and your computer will freeze even if you have tons of memory left!

Especially toxic if you have some broken slow external USB drive which also needs ridiculous IO intensity to read/write.

Examples

I basically exhausted all combinations of DD options. A single DD seems to be impossible for this task, since it cannot perform asynchronous IO. Otherwise, in the chain of DD-buffer, the largest block must be filled before it starts to act like a FIFO. So, if you don't care about initial delay when filling the pipe... A chain of two dd's work. I hope someone else can provide a more elegant solution, but here is an example usage.

Example 1: Tarring all files from a heavily fragmented HDD A (response time jitters) to a heavily fragmented HDD B (jitters), using XZ as compression algorithm (slow) in parallel (jitters if you are actually using the computer) (disclaimer: I am writing this from my head, so minor details might be wrong. Use at your own risk):

tar -cpSf - -C /mnt/A . | \
  dd obs=1024M | dd obs=1024M | \
  xz -T 0 -c | \
  dd obs=1024M | dd obs=1024M | \
  dd bs=512M iflag=fullblock of=/mnt/B/A.tar.xz

Add pv to see speed. Here, xz starts only after 1GB data is read from A(unless it has less than 1GB, then it finishes). Similarly, disk writing to B starts only after 1GB data comes out from xz. This code gives 2GB buffer between tar and xz, and 2GB between xz and writing. The bs=512M at the end is not really necessary, but I found a large (>64M) block size gives a better average writing speed, especially on USB hard drives. I suppose it creates less fragments, too, if drive B is in use (not confirmed).

Example 2. Objective: copy a gigantic file from a heavily fragmented disk A to a heavily fragmented disk B.

dd if=/mnt/A/file obs=<half of cache you want> | dd bs=<half of cache> iflag=fullblock oflag=direct of=/mnt/B/file

This is one of the simplest form I can find. If the file is gigantic enough, the initial time used to fill the cache should be negligible. Meanwhile, it reads/writes asynchronously, and hopefully groups enough writes together to get some sequential performance. I suppose SSD's won't care about block size, though.

Example 3. Thanks to Kamil Maciorowski, I now have the following in my .zshrc:

buffer() {
    if [ "$2" -gt 0 ] ; then
        dd status=none obs="$1" | buffer "$1" $(($2-1))
    else 
        cat 
    fi
}

Now, if you need 3 blocks of 512M buffer, chain buffer 512M 3 in your pipeline. Generally, if your job is large enough for your throughput (eg. Copying/Compressing 100GB+ data @ 100MB/s on average), a smaller block gives no advantage other than filling the pipe more quickly (which is irrelevant since this time is small). I have observed that if you put in too many blocks, the CPU might be so busy on IO that the command freezes the entire computer.

Now, Example 1 becomes

tar -cpSf - -C /mnt/A . | \
buffer 1024M 2 | \
xz -T 0 -c | \
buffer 1024M 2 | \
dd bs=512M iflag=fullblock of=/mnt/B/A/tar.xz

Carl Dong

Posted 2017-05-04T20:58:31.150

Reputation: 141