Is there a way to see any tar progress per file?

133

70

I have a couple of big files that I would like to compress. I can do this with for example

tar cvfj big-files.tar.bz2 folder-with-big-files

The problem is that I can't see any progress, so I don't have a clue how long it will take or anything like that. Using v I can at least see when each file is completed, but when the files are few and large this isn't the most helpful.

Is there a way I can get tar to show more detailed progress? Like a percentage done or a progress bar or estimated time left or something. Either for each single file or all of them or both.

Svish

Posted 2010-07-28T11:51:37.173

Reputation: 27 731

Answers

109

I prefer oneliners like this:

tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '{print $1}') | gzip > big-files.tar.gz

It will have output like this:

4.69GB 0:04:50 [16.3MB/s] [==========================>        ] 78% ETA 0:01:21

For OSX (from Kenji's answer)

tar cf - /folder-with-big-files -P | pv -s $(($(du -sk /folder-with-big-files | awk '{print $1}') * 1024)) | gzip > big-files.tar.gz

checksum

Posted 2010-07-28T11:51:37.173

Reputation: 1 304

6Nice, a one liner. Can you explain it? Or does it just magically work somehow? – Kissaki – 2014-10-04T18:02:48.240

Can you write command to extract tar file like above? – Krzysztof Szewczyk – 2014-11-26T13:47:54.257

2Ok, I have it pv $FILE.tgz | tar xzf - -C $DEST_DIR – Krzysztof Szewczyk – 2014-11-28T07:57:07.110

1For OS X, I needed to use the square bracket form for arithmetic expansion, which made: tar cf - /folder-with-big-files -P | pv -s $[$(du -sk /folder-with-big-files | awk '{print $1}') * 1024] | gzip > big-files.tar.gz Without this change, I was getting -bash: syntax error near unexpected token ')' – Dean Becker – 2015-03-19T11:18:56.290

Any idea how to make this work with pixz? For me, it works with gzip, pigz and xz, but not with pixz, pxz or when specifying multiple cores to xz (xz -T<num>) – joelostblom – 2016-02-03T20:30:04.777

I've wrapped this oneliner in a bash function here: https://github.com/equant/my_bash_tools/blob/master/tarp.bash

– equant – 2017-09-29T20:59:00.887

thanks! I used it for backing up my home dir https://gist.github.com/timabell/68d112d66623d9a4a3643c86a93debee

– Tim Abell – 2018-03-15T10:34:23.063

Hello, how to do it for combining multiple directory/files into a one zip file? – Sisir – 2018-07-12T06:20:59.537

1Note that the progress doesn't show until the du command finishes which could take a while depending on the size, complexity, and fragmentation of the directory. – Rooster242 – 2019-01-10T20:11:48.960

1I'm a bit late to the party, but I was wondering why this answer suggests the use of the -P option on tar. That seems like bad advice, given that the OP didn't mention a need for absolute paths in the tarball (and having them can cause real headaches when it comes time to extract the archive). – Brian A. Henning – 2019-02-15T16:25:40.230

@KrzysztofSzewczyk this does not work with a tar.xz archive. i tried variants of pv archive.tar.xz | tar xf - -C ~/location, pv archive.tar.xz | tar xf -J -C ~/location, pv archive.tar.xz | tar xf --C ~/location, pv archive.tar.xz | tar xf -JC ~/location, ... but none seem to work. without pv it's : tar xf archive.tar.xz -C ~/location – tatsu – 2019-03-08T10:13:29.050

1

okay I figured it out thanks to this : https://stackoverflow.com/a/19372542/4770754 , it's :pv archive.tar.xz | tar xp -J -C ~/location

– tatsu – 2019-03-08T10:19:58.527

For those who only want to tar without compression on macOS: tar -c folder-with-big-files | pv -s $[$(du -sk folder-with-big-files | awk '{print $1}') * 1024] > folder-with-big-files.tar. – Bugs Bunny – 2019-09-02T20:00:20.803

what in the world is 'pv'? doesnt seem to exist on Mac OS X – qodeninja – 2019-09-25T02:47:41.237

Might I suggest using cut instead of awk for this? Like cut -f1 instead of awk '{print $1}'? – Tripp Kinetics – 2019-10-09T15:57:07.987

The reason why I suggest is that it's a little less fault-prone, and is a much lighter command. – Tripp Kinetics – 2019-10-09T15:59:53.193

2On OSX, du does not take -b argument, needed to fallback to : $((du -sk /folder-with | awk '{print $1}') * 1024)) – ıɾuǝʞ – 2013-11-29T10:14:25.840

77

You can use pv to achieve this. To report the progress correctly, pvneeds to know how much bytes you are throwing at it. So, the first step is to calculate the size (in kbyte). You can also completely drop the progress bar and just let pv tell you how much bytes it has seen; it would report a 'done that much and that fast'.

% SIZE=`du -sk folder-with-big-files | cut -f 1`

And then:

% tar cvf - folder-with-big-files | pv -p -s ${SIZE}k | \ 
     bzip2 -c > big-files.tar.bz2

akira

Posted 2010-07-28T11:51:37.173

Reputation: 52 754

Any idea how to make this work with pixz? For me, it works with gzip, pigz and xz, but not with pixz, pxz or when specifying multiple cores to xz (xz -T<num>) – joelostblom – 2016-02-03T20:34:14.763

Cool. pv doesn't seem to come with Mac OS X, but will try this out once I have a computer with MacPorts on it. Could you explain what you are doing there though? Not quite sure what the first line does exactly. – Svish – 2010-07-28T12:10:34.430

4first line: fetch info about how many bytes will be handled. second line: use the size from the first line to allow pv to render 'progress'. since you are piping data, pv does not know how many more bytes will come. – akira – 2011-07-22T10:25:28.423

One addition: SIZE=$(($SIZE * 1000 / 1024)) - I don't know whether or not this is a quirk on my particular platform, so I'm not adding it to the answer: du returns size where 1 kb = 1024 bytes, while pv seems to be expecting 1 kb = 1000 bytes. (I'm on Ubuntu 10.04) – Izkata – 2011-12-11T02:27:59.213

2@lzkata you could always ask du to use your preferred blocksize, e.g. du -s --block-size=1000, or just work with plain bytes, e.g. drop the k's from the du and pv calls. Nevertheless, I would expect both to use 1024 unless told otherwise, e.g. the --si switch on du, for example. – Legolas – 2012-02-23T11:05:28.403

1or just drop the k-stuff and just use plain bytes (du -sb and pv -s without any modifier). that should end all the confusion. – akira – 2012-02-23T11:10:07.263

23

better progress bar..

apt-get install pv dialog

(pv -n file.tgz | tar xzf - -C target_directory ) \
2>&1 | dialog --gauge "Extracting file..." 6 50

enter image description here

Mr. Black

Posted 2010-07-28T11:51:37.173

Reputation: 337

2This is works for extraction, but you still need to do one of the more complicated commands for creation (which was the original question). It could still be combined with those; it's just more complicated. – Daniel H – 2014-08-09T05:05:43.587

17

Check out the --checkpoint and --checkpoint-action options in the tar info page (as for my distribution, the description for these options is not contained in the man page → RTFI).

See https://www.gnu.org/software/tar/manual/html_section/tar_26.html

With these (and maybe the functionality to write your own checkpoint command), you can calculate a percentage…

helper

Posted 2010-07-28T11:51:37.173

Reputation: 179

3This should be the correct answer. Others just explain extra tools (not installed by default, besides) to achieve something similar. – Carmine Giangregorio – 2016-11-15T14:47:08.293

@Sardathrion Maybe because it's GNU-tar specific. – phk – 2017-02-24T11:52:09.707

11

Inspired by helper’s answer

Another way is use the native tar options

FROMSIZE=`du -sk ${FROMPATH} | cut -f 1`;
CHECKPOINT=`echo ${FROMSIZE}/50 | bc`;
echo "Estimated: [==================================================]";
echo -n "Progess:   [";
tar -c --record-size=1K --checkpoint="${CHECKPOINT}" --checkpoint-action="ttyout=>" -f - "${FROMPATH}" | bzip2 > "${TOFILE}";
echo "]"

the result is like

Estimated: [==================================================]
Progess:   [>>>>>>>>>>>>>>>>>>>>>>>

a complete example here

campisano

Posted 2010-07-28T11:51:37.173

Reputation: 141

8

Using only tar

tar has the option (since v1.12) to print status information on signals using --totals=$SIGNO, e.g.:

tar --totals=USR1 -czf output.tar input.file
Total bytes written: 6005319680 (5.6GiB, 23MiB/s)

The Total bytes written: [...] information gets printed on every USR1 signal, e.g.:

pkill -SIGUSR1 tar

Source:

Murmel

Posted 2010-07-28T11:51:37.173

Reputation: 555

And to know the total original size, we can use du -hs /path. But how can we estimate the total bytes to be written when using the -z flag? I assume it would be less than the original size – lucidbrot – 2019-11-11T17:07:41.457

I have asked a Q related to my comment on unix.se

– lucidbrot – 2019-11-12T19:27:07.933

3

Just noticed the comment about MacOS, and while I think the solution from @akira (and pv) is much neater I thought I'd chase a hunch and a quick playaround in my MacOS box with tar and sending it a SIGINFO signal. Funnily enough, it worked :) if you're on a BSD-like system, this should work, but on a Linux box, you might need to send a SIGUSR1, and/or tar might not work the same way.

The down side is that it will only provide you with an output (on stdout) showing you how far through the current file it is since I'm guessing it has no idea about how big the data stream it's getting is.

So yes, an alternative approach would be to fire up tar and periodically send it SIGINFOs anytime you want to know how far it's gotten. How to do this?

The ad-hoc, manual approach

If you want to be able to check status on an ad-hoc basis, you can hit control-T (as Brian Swift mentioned) in the relevant window which will send the SIGINFO signal across. One issue with that is it will send it to your entire chain I believe, so if you are doing:

% tar cvf - folder-with-big-files | bzip2 -c > big-files.tar.bz2

You will also see bzip2 report it's status along with tar:

a folder-with-big-files/big-file.imgload 0.79  cmd: bzip2 13325 running 
      14 0.27u 1.02s 

      adding folder-with-big-files/big-file.imgload (17760256 / 32311520)

This works nicely if you just want to check if that tar you're running is stuck, or just slow. You probably don't need to worry too much about formatting issues in this case, since it's only a quick check..

The sort of automated approach

If you know it's going to take a while, but want something like a progress indicator, an alternative would be to fire off your tar process and in another terminal work out it's PID and then throw it into a script that just repeatedly sends a signal over. For example, if you have the following scriptlet (and invoke it as say script.sh PID-to-signal interval-to-signal-at):

#!/bin/sh

PID=$1
INTERVAL=$2
SIGNAL=29      # excuse the voodoo, bash gets the translation of SIGINFO, 
               # sh won't..

kill -0 $PID   # invoke a quick check to see if the PID is present AND that
               # you can access it..

echo "this process is $$, sending signal $SIGNAL to $PID every $INTERVAL s"
while [ $? -eq 0 ]; do
     sleep $INTERVAL;
     kill -$SIGNAL $PID;    # The kill signalling must be the last statement
                            # or else the $? conditional test won't work
done
echo "PID $PID no longer accessible, tar finished?"

If you invoke it this way, since you're targeting only tar you'll get an output more like this

a folder-with-big-files/tinyfile.1
a folder-with-big-files/tinyfile.2
a folder-with-big-files/tinyfile.3
a folder-with-big-files/bigfile.1
adding folder-with-big-files/bigfile.1 (124612 / 94377241)
adding folder-with-big-files/bigfile.1 (723612 / 94377241)
...

which I admit, is kinda pretty.

Last but not least - my scripting is kinda rusty, so if anyone wants to go in and clean up/fix/improve the code, go for your life :)

tanantish

Posted 2010-07-28T11:51:37.173

Reputation: 1 103

@FelipeAlvarez Because you have probably not been on a BSD/MAC OS. See also: SIGINFO on GNU Linux (Arch Linux) missing

– Murmel – 2019-11-12T19:17:41.140

@tanantish You could get rid of the "voodoo" (your wording not mine ;P ) by using SIGNAL=$(kill -l SIGINFO), which would have the advantage of failing on systems without SIGINFO – Murmel – 2019-11-12T19:21:21.973

2If running tar on the command line, typing control-T will send it a SIGINFO. If this was in a script it would be done with kill -INFO pid – Brian Swift – 2012-04-23T04:58:42.547

Completely forgot about control-T, I clearly have gotten used to spamming too many console windows for my own good.. – tanantish – 2012-04-23T20:21:02.500

1why can't I see -SIGINFO when doing kill -l – Felipe Alvarez – 2013-06-12T02:32:16.323

2

Inspired by Noah Spurrier’s answer

function tar {
  local bf so
  so=${*: -1}
  case $(file "$so" | awk '{print$2}') in
  XZ) bf=$(xz -lv "$so" |
    perl -MPOSIX -ane '$.==11 && print ceil $F[5]/50688') ;;
  gzip) bf=$(gzip -l "$so" |
    perl -MPOSIX -ane '$.==2 && print ceil $F[1]/50688') ;;
  directory) bf=$(find "$so" -type f | xargs du -B512 --apparent-size |
    perl -MPOSIX -ane '$bk += $F[0]+1; END {print ceil $bk/100}') ;;
  esac
  command tar "$@" --blocking-factor=$bf \
    --checkpoint-action='ttyout=%u%\r' --checkpoint=1
}

Source

Steven Penny

Posted 2010-07-28T11:51:37.173

Reputation: 7 294

17A little context and explanation maybe? – Kissaki – 2014-10-04T18:03:43.897

1

If you known the file number instead of total size of all of them:

an alternative (less accurate but suitable) is to use the -l option and send in the unix pipe the filenames instead of data content.

Let's have 12345 files into mydir, command is:

[myhost@myuser mydir]$ tar cfvz ~/mytarfile.tgz .|pv -s 12345 -l > /dev/null 

you can know such value in advance (because of your use case) or use some command like find+wc to discover it:

[myhost@myuser mydir]$ find | wc -l
12345

bzimage

Posted 2010-07-28T11:51:37.173

Reputation: 36

So, why not put this command into sub-command? =) – Kirby – 2018-01-09T14:12:54.390

tar cfvz ~/mytarfile.tgz . | pv -s $(find . | wc -l) -l > /dev/null. Does it work for you? – Kirby – 2018-01-09T14:18:21.117

1

Method based upon tqdm:

tar -v -xf tarfile.tar -C TARGET_DIR | tqdm --total $(tar -tvf tarfile.tar | wc -l) > /dev/null

J_Zar

Posted 2010-07-28T11:51:37.173

Reputation: 111

1

On macOS, first make sure that you have all the commands available, and install the missing ones (e.g. pv) using brew.

If you only want to tar without compression, go with:

tar -c folder-with-big-files | pv -s $[$(du -sk folder-with-big-files | awk '{print $1}') * 1024] > folder-with-big-files.tar

If you want to compress, go with:

tar cf - folder-with-big-files -P | pv -s $[$(du -sk folder-with-big-files | awk '{print $1}') * 1024] | gzip > folder-with-big-files.tar.gz

Note: It may take a while before the progress bar appears. Try on a smaller folder first to make sure it works, then move to folder-with-big-files.

Bugs Bunny

Posted 2010-07-28T11:51:37.173

Reputation: 133

0

Here are some numbers of a prometheus (metrics data) backup on Debian/buster AMD64:

root# cd /path/to/prometheus/
root# tar -cf - ./metrics | ( pv -p --timer --rate --bytes > prometheus-metrics.tar )

Canceled this job as there was not enough disk-space available.

Experimenting with zstd as compressor for tar with monitoring the progress using pv:

root# apt-get update
root# apt-get install zstd pv

root# tar -c --zstd -f - ./metrics | ( pv -p --timer --rate --bytes > prometheus-metrics.tar.zst )
10.2GiB 0:11:50 [14.7MiB/s]

root# du -s -h prometheus
62G    prometheus

root# du -s -h prometheus-metrics.tar.zst
11G    prometheus-metrics.tar.zst

dileks

Posted 2010-07-28T11:51:37.173

Reputation: 1

0

In my daily use I don't need to know the exact percent-level progress of the operation, only if it is working and (sometimes) how much it is near completion.

I solve this need minimally by showing the number of files processed in its own line; in Bash:

let n=0; tar zcvf files.tgz directory | while read LINE; do printf "\r%d" $((n++)) ; done ; echo

As I use this a lot, I defined a function alias in .bashrc:

function pvl { declare -i n=0; while read L ; do printf "\r%d" $((++n)) ; done ; echo ; }

Then simply:

tar zcvf files.tgz directory | pvl

I can compute the number of files in advance if needed with find directory | wc -l (Or better using the same function shown [find directory | pvl] to squash my impatience!).

Another example, setting rights for a virtual website (after that, a chown -R is fast because the filenames are in the filesystem cache):

find /site -print -type d -exec chmod 2750 "{}" \; -o -type f -exec chmod 640 "{}" | pvl

It's true this lateral processing could slow the main operation, but I think printing a return character and a few digits cannot be too expensive (besides that, waiting for the next equal sign to appear or percent digit to change feels slow compared with the subjective blazing speed of changing digits!).

Fjor

Posted 2010-07-28T11:51:37.173

Reputation: 1