The Question

When I looked for pipe buffering tools in *NIX, I see suggestions of using buffer, mbuffer, or pv. However, the former two are not always in distros' official repo (such as Arch), while pv (as of 1.6.0) has a bug which prevents this functionality. In a few other questions, I see mentions about dd used as buffers, and I would like to explore it because dd is always there. However, none are elaborate enough to make real sense, so here I ask for a "proper" way to use it.

Questions mentioned dd include https://unix.stackexchange.com/questions/345072/can-dd-be-used-to-add-a-buffer-to-a-pipe and https://unix.stackexchange.com/questions/21918/utility-to-buffer-an-unbounded-amount-of-data-in-a-pipeline

For ease of testing, I provide a test script below, with some comments about my own experiments. Details will be explained after the code listing. Please make sure you have pv installed and at least 256M memory before running!

#!/bin/sh

producer() {
    while [ 1 ]; do
    dd if=/dev/zero bs=64M count=1 iflag=fullblock status=none
    sleep 4
    done
}

buffer() {
    # Works, but long
    # Must at least fill 32M before consumer starts
    # So, must choose small obs and chain more to look
    # more like a proper "buffer"
    dd obs=32M status=none | \
        dd obs=32M status=none| \
        dd obs=32M status=none| \
        dd obs=32M status=none
    # Doesn't work, producer rate limited
    #dd bs=128M status=none 
    # Doesn't work, producer must fill buffer, then
    # must wait until buffer is empty
    #dd obs=128M status=none 
    # Doesn't work, producer rate limited
    #dd ibs=128M status=none 
    # Doesn't work, producer must fill buffer, then
    # must wait until buffer is empty
    #dd bs=128M status=none iflag=fullblock
}

consumer() {
    pv --rate-limit 1M -q | dd of=/dev/null status=none
}

producer | pv -cN produce | buffer | pv -cN consume | consumer

Here, the producer produces 64MB of data every 4 seconds, with a 128MB buffer, while the consumer consumes at a constant 1MB/s rate. Of course, this means that buffer will overflow pretty quick, but this is to show effects clearly. Ideally, before buffer fills up (at the third production), we should see a constant 1MB/s consumption, and bursts of production giving 64MB data each. The "correct" output looks like this:

  produce:  128MiB 0:00:07 [   0 B/s] [  <=>                                                       ]
  consume: 7.25MiB 0:00:07 [1.01MiB/s] [       <=>                                                 ]

Here, the working solution is shown as following:

dd obs=32M status=none | \
    dd obs=32M status=none| \
    dd obs=32M status=none| \
    dd obs=32M status=none

This is constructed by splitting the required 128MB buffer into 4 chunks. Yes, each chunk must fill up before data is passed to next level, but since 32MB is smaller than the 64MB burst, it works for this test, as if it is a real buffer. Now, there are some problems.

In real applications, we don't have an instantaneous burst of data, so chunks need to be small, but not too small. Which means there will be a long chain of dd commands
~~What if an EOF is encountered, before 32MB mark is reached? Will that block be lost?~~ I tested with dd if=test.txt| dd obs=1M | dd obs=1M | dd of=test2.txt and compared result. Turns out that this is not a problem. So, using it for backup won't corrupt data.
How much overhead does it create?
Is there a more elegant way to achieve the same, by cleverly arranging parameters?

There are a few other attempts included in the script, and they don't work, as explained in comments. And I have tried using FIFO + background processes, which yields the same result.

PS. Just so you know, buffering a pipe is quite useful when backing up A into B, especially when A is a HDD, which has seek time. So I would do something like this:

tar cSpf - <path> -C <root path> | <a large buffer> | <some parallel compressor> \
| <small buffer if compressor is generally slow and B have seek time> \
| dd bs=<several GB if B is not an SSD> iflag=fullblock oflag=direct of=<archive.tar.??>

Carl Dong

Posted 2017-05-04T20:58:31.150

Reputation: 141

To chain identical commands you can use recursive function, e.g. buf() { if [ "$2" -gt 0 ] ; then dd status=none obs="$1" | buf "$1" $(($2-1)) ; else cat ; fi ; }. Usage: producer | buf 32M 4 | consumer. – Kamil Maciorowski – 2017-05-05T07:05:05.993

That sounds better, definitely. – Carl Dong – 2017-05-05T15:48:27.053

Proper, if any, way to use dd as pipe buffer?

The Question

Answers

Caution

Examples