How to combine the 'tar' command with 'find'

31

14

The find command gives this output:

[root@localhost /]# find var/log/ -iname anaconda.*
var/log/anaconda.log
var/log/anaconda.xlog
var/log/anaconda.yum.log
var/log/anaconda.syslog
var/log/anaconda.program.log
var/log/anaconda.storage.log

After combining with tar it's showing this output:

[root@localhost /]# find var/log/ -iname anaconda.* -exec tar -cvf file.tar {} \;
var/log/anaconda.log
var/log/anaconda.xlog
var/log/anaconda.yum.log
var/log/anaconda.syslog
var/log/anaconda.program.log
var/log/anaconda.storage.log

But while listing tar file it's showing only a single file

[root@localhost /]# tar -tvf file.tar
-rw------- root/root    208454 2012-02-27 12:01 var/log/anaconda.storage.log

What I am doing wrong here?

With xargs I am getting this output:

[root@localhost /]# find var/log/ -iname anaconda.* | xargs tar -cvf file1.tar

Second question

While typing / in front of var, means find /var/log why its giving this mesaage tar: Removing leading `/' from member names

[root@localhost /]# find /var/log/ -iname anaconda.* -exec tar -cvf file.tar {} \;
tar: Removing leading `/' from member names
/var/log/anaconda.log
tar: Removing leading `/' from member names
/var/log/anaconda.xlog
tar: Removing leading `/' from member names
/var/log/anaconda.yum.log
tar: Removing leading `/' from member names
/var/log/anaconda.syslog
tar: Removing leading `/' from member names
/var/log/anaconda.program.log
tar: Removing leading `/' from member names
/var/log/anaconda.storage.log

In a simple form what is the difference between in the following two?

find var/log and find /var/log

max

Posted 2012-12-01T14:23:50.280

Reputation: 3 329

1If you use {} + instead of {} \; it will group results of find into one argument – Jason S – 2016-09-05T23:11:58.447

This is semi+off topic, but going forward with the find command, you should quote the search term. It works without sometimes but not always. – nerdwaller – 2012-12-01T15:39:04.583

Answers

38

Note: See @Iain's answer for a somewhat more efficient solution.

Note that find will call the -exec action for every single file it finds.

If you run tar -cvf file.tar {} for every single file find outputs, this means you'll overwrite file.tar every time, which explains why you end up with one archive left that only contains anaconda.storage.log — it's the last file find outputs.

Now, you actually want to append the files to the archive instead of creating it each time (this is what the -c option does). So, use the following:

find var/log/ -iname "anaconda.*" -exec tar -rvf file.tar {} \;

The -r option appends to the archive instead of recreating it every time.

Note: Replace -iname anaconda.* with -iname "anaconda.*". The asterisk is a wildcard and can be expanded by your shell before find even sees it. To prevent this expansion, wrap the argument in double quotes.


As for tar removing leading /: The archive should only contain relative file names. If you added files with a leading /, they would be stored as absolute file names, literally meaning /var/… on your computer, for example.

IIRC this is simply a precaution for tar implementations other than GNU, and it's safer this way because you won't overwrite your actual data in /var/… when you extract the archive if it contains relative filenames.

slhck

Posted 2012-12-01T14:23:50.280

Reputation: 182 472

2You can use {} + instead of {} \; so it will group the results of find into one argument – Jason S – 2016-09-05T23:11:12.380

A solution like this wouldn't (necessarily) be able to work for compressed tar files. As @JasonS says, using {} + will allow the whole archive to be created in one go. – mwfearnley – 2017-07-11T08:38:57.473

6But note that if you tried taring to an actual tape archive this way, adding one file at at time, rewinding the tape, then rereading the whole thing each time to get to the end, the whole thing would be ridiculously slow. Your solution is only suitable if you're writing the tar file to disk. – Nicole Hamilton – 2012-12-01T14:55:47.033

2True, but I think we can safely disregard this situation ;) – slhck – 2012-12-01T15:03:14.183

@slhck * is a wildcard that should match all the possibility right? but here find /var/log/ -iname anaconda* giving nothing and find /var/log/ -iname anaconda.* giving the output, why? – max – 2012-12-02T06:49:11.680

When a wildcard is consumed, it won't be seen by find anymore. So if you have anaconda*, and in your current folder there's something named, for example, anaconda5 (matching this wildcard), the wildcard will be expanded, and find will see -iname anaconda5 instead of -iname anaconda*. Why the first doesn't work and the second does depends on what files are in your current directory. @max – slhck – 2012-12-02T08:55:49.173

42

You can use something like:

find var/log -iname 'anaconda.*' -print0 | tar -cvf somefile.tar --null -T -

The -print0 and -T work together to allow filenames with spaces newlines, etc. The final - tells tar to read the input filenames from stdin.

Note that -print0 must come at the end of your statement, per this answer. Otherwise you will probably get more files than you expect.

user35787

Posted 2012-12-01T14:23:50.280

Reputation:

1To expand on @mivk's comment, when tar reads what would be a file name containing a nul character, it will issue a warning and switch to treating the input as nul-terminated. However, this can break if any file names contain a newline character, as the nul character may be seen too late to make the switch. Given that the whole point of using nul terminators is to handle file names containing newline characters, the --null really needs to be specified. – hvd – 2016-06-05T10:48:34.047

2And --no-unquote turns out to be needed as well: file names containing backslashes would otherwise be mishandled. (No, this isn't a hypothetical -- I'm really creating a tar archive from someone else's code, containing a filename with backslashes in the name, that's how I found out.) – hvd – 2016-06-05T12:34:49.100

2@hvd then you should seek them out and give them a good kicking. – None – 2016-06-05T13:48:27.457

2You've omitted the -name option, causing your solution to tar the whole directory. If that's what you want, you could do it more easily as tar -cvf file.tar var/log without using find at all. – Nicole Hamilton – 2012-12-01T15:45:05.903

@NicoleHamilton: You're right and it's easily fixed (which you could have done too). It doesn't affect what I was demonstrating which is that you can use - as the input file to tar so the output of find can be piped into it. – None – 2012-12-01T15:56:16.773

2+1 Piping the list to tar is a good idea. It's definitely the best solution if you expect the pathnames may have spaces. I would even describe it as the best technically, since it's both reliable and efficient. But it requires additional special knowledge of both find and tar. I prefer command substitution pretty much only because it's a more general tool: Learn how to use it once, then use it everywhere. (But I concede, I'm on Windows with a shell where it always works.) Apologies if I seemed rude. – Nicole Hamilton – 2012-12-01T17:49:23.740

@NicoleHamilton: Do you not run the risk of Argument List Too Long Error because the command substitution expands onto the command line ? – None – 2012-12-01T18:07:10.523

2

You already got your +1. Be happy. :) Long command lines are always the bane of the process creation i/f on any OS. I remember arguing with Mark Lucovsky at Microsoft in the early 90s that their 32K Unicode characters limit on NT was too small and having him complain I had no idea how many more bytes it would take to store lengths as longs rather than shorts everywhere in the kernel. Sigh. The more general case solutions when the arg list is too long are to do more in the shell (if possible; in mine it is) or use xargs.

– Nicole Hamilton – 2012-12-01T18:33:48.947

9if you use find's -print0 option, you also need tar's --null option. – mivk – 2014-02-01T18:00:46.127

12

Try this:

tar -cvf file.tar `find var/log/ -iname "anaconda.*"`

You were trying to use find to -exec tar. But the way the -exec option works, it runs that command once for each matching file it finds, causing tar to overwrite the tar file it produces each time. That's why you only ended up with the last one. Also, you need to put quotes around the pattern you specify to find so that the shell doesn't expand it before passing it to find.

Using command substitution with backticks (or using $(...) notation if you prefer), the entire list of names produced by find is pasted back onto the command line as arguments to tar, causing it to write them all at once.

Nicole Hamilton

Posted 2012-12-01T14:23:50.280

Reputation: 8 987

Anyone having issues with Linux spaces should check out the BASH $IFS variable which allows you to define what characters serve as delimiters for BASH commands. – Eric Kigathi – 2015-10-12T14:52:09.190

2

This could end up bad if find outputs files with spaces in their name, newlines or globbing characters. This is bound to fail – piping stdout from find is rarely a good idea. http://mywiki.wooledge.org/ParsingLs

– slhck – 2012-12-01T14:42:20.213

3@slhck, piping stdout from find is in fact usually a good idea, as very clearly explained in the page you linked to in your comment :). It is in fact the recommended way to do things. You should just use some tricks (such as read -r of -print0) as I did in my answer. – terdon – 2012-12-01T14:47:13.503

@terdon Except that most people don't use the tricks and then wonder why their scripts fail (such as in this post). – slhck – 2012-12-01T14:48:18.627

4

@slhck This is why file and directory names in Unix and Linux have traditionally avoided spaces in names. It's also why, on Windows, where names with spaces are common, I added an additional command substitution notation to my own Hamilton C shell using double backticks that treating the whole lines (possibly including spaces) as single words to be pasted back onto the command line. Unfortunately, none of the Unix shells have that feature.

– Nicole Hamilton – 2012-12-01T14:49:24.717

1They might have traditionally avoided it, but with files being created in the user space through GUIs, you can't neglect files with spaces anymore and treat them as second class citizens (just because it's Unix). It's nice you included that in your shell, but it's for Windows, and Unix shells don't particularly need that feature if you simply use the right syntax and take proper precautions. Which is why I've posted my comment in the first place. – slhck – 2012-12-01T15:06:48.553

Does this business of using GUIs to create names with spaces happen a lot in /var/log? – Nicole Hamilton – 2012-12-01T15:13:09.273

2No, but in other places it might very well happen. That's why it's a good idea to program defensively – better be safe than sorry. Also, visitors finding this question might not necessarily have the exact same problem and wonder why the command they've found here appeared to work for this very case but failed for them. I'll leave it up to you to fix the command, I just thought it was important mentioning it because many people run into this issue sooner or later. – slhck – 2012-12-01T16:04:27.740

All fine, but consider that Unix and Linux fundamentally don't handle spaces well period. For example, consider the bashism of for i in *; do a=$i; ls $a; done, which fails for any name that contains a space. The good news (as in the OP's case with tar as well) is that when you hit a name that does have a space and it fails, it would be pretty contrived for it to fail silently. So you can usually safely try it the simple way, assuming no spaces, knowing that if it works, you're done. – Nicole Hamilton – 2012-12-01T16:18:13.107

1Should also point out there's a difference between what you do interactively at the command line versus what you write in a script. I generally try the simple solution first when I'm working interactively. – Nicole Hamilton – 2012-12-01T16:26:17.607

@NicoleHamilton: You stated none of the Unix shells have that feature. I do not completely agree: With zsh this ugly thing can handle white spaces in file names: for i (${(f)"$(ls -1)"}) echo $i as it splits on newlines; so things will break again, if the filename contains a newline (but IMHO that's an even more bad practice than with blanks). And, last but not least, your solution with double backticks is of course much cleaner -- I also like things simple in an interactive session :) – mpy – 2013-05-09T19:10:10.377

6

Question 1

Your command fails because tar is taking each of the files found and archiving them into file.tar. Each time it does so, it will overwrite the previously created file.tar.

If what you want is one archive with all the files, then simply run tar directly, there is no need for find (and yes, this works for files with spaces in their names):

tar -vcf file.tar /var/log/anaconda*   

Question 2

The two commands are completely different:

  • find var/log will search a directory called var/log which is a subdirectory of your current directory, it is equivalent to find ./var/log (notice the ./).

  • find /var/log will search a directory called /var/log which is a subdirectory of the root, /.

The leading / message is from tar, not find. It means that it is removing the first / of your file names to make absolute paths into relative. This means that the file from /var/log/anaconda.error will be extracted to ./var/log/anaconda.error when you untar the archive.

terdon

Posted 2012-12-01T14:23:50.280

Reputation: 45 216

2

There are two ways -exec can work. One way runs the command many times - once for each file; the other way runs the command once, including all the files as a list of parameters.

  • -exec tar -cvf file.tar {} ';' runs the tar command for each file, overwriting the archive each time.
  • -exec tar -cvf file.tar {} '+' runs the tar command once, creating an archive of all the files found.

mwfearnley

Posted 2012-12-01T14:23:50.280

Reputation: 5 885

1

I think using -exec for each file can make the tar compression very slow, if you have a lot of files. I prefer use the command:

find . -iname "*.jpg" | cpio -ov -H tar -F jpgs.tar

fabceolin

Posted 2012-12-01T14:23:50.280

Reputation: 111

until it starts failing with /bin/cpio: xxx: Cannot open: Too many open files – SYN – 2019-07-23T12:05:49.323