21

I'm running a shell script that pipes data from one process to another

process_a | process_b

Does anyone know a way to find out how many bytes were passed between the two programs? The only solution I can think of at the moment would be to write a small c program that reads from stdin, writes to stdout and counts all the of the data transfered, storing the count in an environment variable, like:

process_a | count_bytes | process_b

Does anyone have a neater solution?

Simon Hodgson
  • 681
  • 3
  • 8
  • 15

4 Answers4

33

Use pv the pipe viewer. It's a great tool. Once you know about it you'll never know how you lived without it.

It can also show you a progress bar, and the 'speed' of transfering.

Amandasaurus
  • 30,211
  • 62
  • 184
  • 246
18

Pipe through dd. dd's default input is stdin and default output is stdout; when it finishes stdin/stdout I/O, it will report to stderr on how much data it transferred.

If you want to capture the output of dd and the other programs already talk to stderr, then use another file-descriptor. Eg,

$ exec 4>~/fred
$ input-command | dd 2>&4 | output-command
$ exec 4>&-
Phil P
  • 3,040
  • 1
  • 15
  • 19
9

process_a | tee >(process_b) | wc --bytes might work. You can then redirect wc's count to where-ever you need it. If process_b outputs anything to stdout/stderr you will probably need to redirect this off somewhere, if only /dev/null.

For a slightly contrived example:

filestore:~# cat document.odt | tee >(dd of=/dev/null 2>/dev/null) | wc --bytes
4295

By way of explanation: tee lets you direct output to multiple files (plus stdout) and the >() construct is bash's "process substitution" which makes a process look like a write-only file in this case so you can redirect to processes as well as files (see here, or this question+answer for an example of using tee to send output to many processes).

David Spillett
  • 22,534
  • 42
  • 66
  • I like this solution, sadly the shelll I'm using (BusyBox) doesn't appear to support the >() notation, but it does provide a way of doing what I'm after. – Simon Hodgson Dec 21 '09 at 09:04
  • Aye, you need a pretty complete bash to have that feature - it is the sort of thing that isn't commonly used so gets stripped out of cut-down shells (even those with a target of being more-or-less bash compatible) like busybox in order to save space. – David Spillett Dec 21 '09 at 12:43
1

I know I'm late to the party, but I believe I have a good answer which can enhance this useful thread.
This is a mix of @Phil P and @David Spillett answer, but:

  • differently from @Phil P 's, it avoids creating a new file
  • differently from @David Spillett 's, it maintains the pipeline structure

Bytes-count is printed to stdout, along with any output of process_b.
You can use a prefix to identify the line containing bytes when working with the output(Bytes: in the example).

exec 3>&1
process_a | tee >({ echo -n 'Bytes:'; wc -c; } >&3) | process_b
exec 3>&-

WARNING:
Do not rely on the order of the lines in the output
The order is unpredictable and it can always differ, even when calling the same script with the same parameters!

Claudio
  • 111
  • 2