What is the difference between TAR vs CPIO archive file formats?

41

8

I am curious and did a bit of reading but still have questions.

What makes CPIO different than TAR? I was told in another question that tar is for pulling together many files into 1 archive which then is usually gzip'd or bzip'd.

Also I was told TAR cannot compress from STDOUT. I want to archive / compress ZFS snapshots for backups. I was wondering if I could combine CPIO with bzip2 to get this effect.

Or do I have the completely wrong idea? Is that not what CPIO's purpose is?

This is the kind of commands I have came up after reading so Oracle docs on backing up ZFS snapshots.

# Backup snapshot to cpio and bzip2 archive
zfs send media/mypictures@20070607 | cpio -o | bzip2 -9c > ~/backups/20070607.bz2

# Restore snapshot from cpio and bzip2 archive
zfs recieve media/mypictures@20070607 | cpio -i | bunzip2 -c ~/backups/20070607.bz2

ianc1215

Posted 2011-10-07T07:11:34.820

Reputation: 2 884

don't forget pax :P – Janus Troelsen – 2013-04-29T16:26:51.480

Answers

28

Both tar and cpio have a single purpose: concatenate many separate files to a single stream. They don't compress data. (These days tar is more popular due to its relative simplicity – it can take input files as arguments instead of having to be coupled with find as cpio has.)

In your case, you do not need either of these tools; they would have no useful effect, because you don't have many separate files. zfs send already did the same thing that tar would have done. So you don't have any files, only a nameless stream.

To compress the snapshot, all you have to do is pipe the zfs output through a compression program:

zfs send media/mypictures@20070607 | gzip -c > ~/backups/20070607.gz

gzip -dc ~/backups/20070607.gz | zfs receive media/mypictures@20070607

(You can substitute gzip with xz or bzip2 or any other stream-compression tool, if you want.)

user1686

Posted 2011-10-07T07:11:34.820

Reputation: 283 655

Oh I see, so my ZFS output is NOT files its a data stream? So that would explain why the Oracle examples do not include TAR in the commands. – ianc1215 – 2011-10-07T18:45:24.953

1@Solignis: You can think of it this way: zfs send already does the same that tar would do. – user1686 – 2011-10-18T12:28:51.897

62

In Addition to what was said before by grawity and Paul:

History

In the "old days", cpio (with option -c used) was the tool to use when it came to move files to other UNIX derivates since it was more portable and flexible than tar. But the tar portabilityissues may be considered as solved since the late 1980s.

Unfortunately it was about that time that different vendors mangled up the -c format of cpio (just look at the manual page for GNU cpio and the option -H). At that time tar became more portable than cpio ... It took almost a whole decade until the different UNIX vendors have sorted that out. Having GNU tar and GNU cpio installed was a must for all admins which had to deal with tapes from different sources back then (even nowadays I presume).

User Interface

tar may use a tape configuration file where the administrator would configure the tape drives connected to the system. The user would then just say "Well I'll take tape drive 1" instead of having to remember the exact device node for the tape (which could be very confusing and are also not standarized across different UNIX platforms.

But the main difference is:

tar is able to search directories on its own and takes the list of files or directories to be backed up from command line arguments.

cpio archives only the files or directories it is told to, but does not search subdirectories recursively on it's own. Also cpio gets the list of items to be archived from stdin - this is why it is almost always used in combination with find.

A cpio command often looks frightening to the beginner if compared with tar:

 $ find myfiles -depth -print0 | cpio -ovc0 | gzip -7 > myfiles.cpio.gz
 $ tar czvf myfiles.tar.gz myfiles

I think that's the main reason why most people use tar to create archive files: For simple tasks like bundling a complete directory its just easier to use.

Also GNU tar offers the option -z which causes the archive to be compressed with GNU zip on the fly, making things even more easier.

On the other hand one may do nifty things with find & cpio. In fact it's a more UNIX-like approach: Why include directory tree search into cpio if there's already a tool that takes care of almost all one can think of: find. Things that come to mind are only backing up files newer than a certain date, restricting the files to those residing in the same filesystem or filtering the find-output with grep -v to exclude certain files...

The people of GNU tar spent a lot of work to include a lot of those things that were previously only possible with cpio. In fact both tools learned from each other - but only cpio may read the format of tar - not the other way around.

tar and output processing

One last note to something you said:

Also I was told TAR cannot compress from STDOUT. I want to archive / compress ZFS snapshots for backups. I was wondering if I could combine CPIO with bzip2 to get this effect.

Well, every version of tar (GNU or not) may be used in a pipe. Just use a minus sign (-) as archive name:

 $ tar cvf - myfiles | bzip > myfiles.tar.bz

Also GNU tar offers the option --to-command to specify a postprocessor command - although I'd still prefer the pipe. Maybe it's of use when writing to certain hardware devices.

ktf

Posted 2011-10-07T07:11:34.820

Reputation: 2 168

1

trombonehero said: BSD tar uses libarchive under the hood, so it can handle cpio, pax, shar. you've said: only cpio may read the format of tar. isn't that a contradiction?

– n611x007 – 2015-11-25T10:18:41.967

wouldnt it be 'from STDIN' that differs, rather then 'to STDOUT'.. 'from STDOUT' don't really make sense to me – Joakim Elofsson – 2011-10-07T21:12:00.763

Well, I was only citing the original question. Ideed - it's somewhat misphrased, but I think one gets the point. – ktf – 2011-10-07T21:17:14.237

3"Why include directory tree search into cpio if there's already a tool that takes care of almost all one can think of" Good question, but then you would have to also ask it for copy (cp), move (mv), diff, etc. ;-) – Mecki – 2013-02-25T19:10:26.760

6

tar and cpio have essentially the same function, which is to create a single contiguous file from an input of multiple files and directories. Originally this was to put the result onto tape, but these days it is generally used to feed into a compression utility as you have above. This is because compressing a single large file is both more time and space efficient than compressing lots of small files. You should note that many image formats (png, jpg etc) are already highly compressed, and may actually get a bit bigger if put through a compression utility.

Neither tar or cpio do any compression themselves. Tar has effectively "won" the "what shall we use to make aggregate files" war, but cpio gets a lookin in various places. I am not aware of any benefits of one over the other, tar wins through being more commonly used.

tar can indeed take input on stdin and output to stdout - which would then be piped into bzip2 like you have or something similar. If called with the "z" option, it will automatically invoke gzip on the output.

Paul

Posted 2011-10-07T07:11:34.820

Reputation: 52 173

4Most recent versions of GNU tar can even guess the desired compression format from the archive file name when you use the option -a. So this: tar -caf myfiles.tar.xz myfiles/ will compress using xz and this tar -caf myfiles.tar.gz myfiles/ will compress using gzip. – gerlos – 2015-10-29T17:09:31.560

1Yeah and isn't -j to invoke bzip2? – ianc1215 – 2011-10-07T18:43:53.770

2yes, -j is bzip2 and some (more resent?) versions got -J as xv, for GNUtar thatis – Joakim Elofsson – 2011-10-07T21:09:44.183

5

I asked a HP tech support in ca. 1996 why use cpio over tar.

I was told that tapes stretch and wear out. When tar reaches an unreadable portion of the tape it fails and returns error number. When cpio reaches an unreadable portion, it continues to the next readable block, resyncs and continues.

I have never seen documentation to support this, but always used cpio.

Lynn

Posted 2011-10-07T07:11:34.820

Reputation: 51

According to the post, bitwise damage of tar seems to be localised to the area/files it affects, the same as you told about cpio. http://oxfordrepo.blogspot.tw/2008/12/archive-file-resiliences.html

– okwap – 2018-03-19T07:38:11.807

4

Also worth noting: on (at least) FreeBSD and Mac OS X, you can manipulate cpio files with tar. BSD tar uses libarchive under the hood, so it can handle cpio, pax, shar...

This means that the usability issues of the cpio command doesn't have to stop you from interacting with cpio files.

trombonehero

Posted 2011-10-07T07:11:34.820

Reputation: 41

ktf said: only cpio may read the format of tar. you've said: BSD tar uses libarchive under the hood, so it can handle cpio, pax, shar. isn't that a contradiction?

– n611x007 – 2015-11-25T10:18:52.900

1@n611x007 This answer talks about BSD tar. The other one is probably talking about GNU tar. They are different programs. – Navin – 2016-03-19T02:57:41.637

3

While the answers here already compare cpio and tar very well, I would like to highlight one of cpio's features called pipeline mode which makes it more efficient to copy selective files (i.e., via find and filter) while preserving their directory structure. This feature is well documented and in its basic premise looks like this:

find . <predicates> | cpio -pdmv /destination/dir

The equivalent with tar would involve something like this:

find . <predicates> | tar -T - -cf - | (cd /destination/dir; tar xvf -)

There are of course other alternatives such as rsync and cp --parents discussed in another thread, but nothing comes close to the flexibility offered by the combination of find and cpio. With tar being ubiquitous for creating archives, this is the only reason for which I still use cpio.

haridsv

Posted 2011-10-07T07:11:34.820

Reputation: 401