58

Is there anyway to extract a tar.gz file faster than tar -zxvf filenamehere?

We have large files, and trying to optimize the operation.

David Houde
  • 3,160
  • 1
  • 15
  • 19
Justin
  • 5,008
  • 19
  • 58
  • 82
  • 2
    Are you finding that the `$ tar -zxvf` method is IO or CPU bound? – EEAA May 18 '11 at 04:07
  • 2
    Believe CPU, how can I check though? – Justin May 18 '11 at 04:11
  • 7
    Not directly related, but 'z' hasn't been required since 2004/tar v1.1.5 http://www.gnu.org/software/tar/#TOCreleases :) – JamesHannah May 18 '11 at 12:06
  • @Justin You might have to install it, but vmstat will tell you about IO or CPU loading. vmstat reports information about processes, memory, paging, block IO, traps, disks and cpu activity you can even run it as a continual process, `vmstat 1 100` or every 1 second, for 100 seconds, vmastat outputs. pigz was really helpful, I decompressed 108GB gz file in minutes that was taking over an hour previously. – j0h Dec 06 '20 at 04:25

3 Answers3

80

pigz is a parallel version of gzip. Although it only uses a single thread for decompression, it starts 3 additional threads for reading, writing, and check calculation. Your results may vary but we have seen significant improvement in decompression of some of our datasets. Once you install pigz, the tar file can be extracted with:

pigz -dc target.tar.gz | tar xf -
Mateen Ulhaq
  • 135
  • 7
TimS
  • 2,136
  • 13
  • 8
  • 18
    +1. FWIW, you can also write that as `tar -xvf --use-compress-program=pigz filenamehere`. (`-z` amounts to `--use-compress-program=gzip`.) Alternatively, you can even make `gzip` be a symlink to `pigz`, and keep using `-zxvf`. – ruakh Nov 19 '12 at 22:36
  • 6
    @ruakh, I had to put `-xf` after `--use-compress-program=pigz`, or I got an error. For some reason, it was no faster than using `gzip` though. – jonderry Mar 07 '15 at 22:39
  • 1
    For `bzip2` there is `pbzip2` (`p` for parallel). `tar --use-compress-program=pbzip2 -xvf file.tar.bz2`. – alfC Sep 22 '15 at 17:56
  • Is there a way to use the `pv` command to show progress, or an equivilant, while also using the `--use-compress-program=pigz` flag? During compression, I can do `gnutar --use-compress-program="pigz | pv" -cf target.tar.gz YourData`, but not sure how to do this during untar/uncompression. – Stefan Lasiewski Jul 11 '18 at 00:57
  • @StefanLasiewski You can use pigz with pv and tar in this way: "tar cf - /your/files | pv | pigz > compressed.tgz" – m_a_s Feb 17 '22 at 16:49
16

if there are many many many small files in the tar ball, cancel the ‘v’ parameter, try again!

anonymous
  • 177
  • 2
  • 5
    I never use -v param. Don't know why people need that much noise in console. – Eimantas May 18 '11 at 05:04
  • 17
    @Eimantas When you untar something that contains many multi-gigabyte files, you will want some indication of progress. :) – Michael Hampton May 18 '13 at 16:08
  • @TimHughes: that's really great to know, please post as a separate answer! – smci Nov 28 '17 at 09:59
  • Michael Hampton if you have a multi-gigabyte files but mixed with a big lists of small files you have a good reason to do not use -v, in my local tests it makes tar very slow specially if you have tar running in a remote server via terminal, what i do is to watch du -s directory so i can watch the directory growing... – Luciano Andress Martini Mar 13 '18 at 19:21
  • 3
    It might be worth using `--checkpoint=NUMBER` (*display progress messages every NUMBERth record*) instead of `-v`. – Stefan Lasiewski Jun 21 '18 at 18:42
11

If you want to see progress use something like pv. Here is an Example:

pigz -dc mysql-binary-backup.tar.gz | pv | tar xf -
Tim Hughes
  • 313
  • 3
  • 10