Using find and tar with files with special characters in the name

5

1

I want to archive all .ctl files in a folder, recursively.

tar -cf ctlfiles.tar `find /home/db -name "*.ctl" -print`

The error message :

tar: Removing leading `/' from member names
tar: /home/db/dunn/j: Cannot stat: No such file or directory
tar: 74.ctl: Cannot stat: No such file or directory

I have these files: /home/db/dunn/j 74.ctl and j 75. Notice the extra space. What if the files have other special characters? How do I archive these files recursively?

Costi

Posted 2010-06-01T23:07:47.373

Reputation: 153

Answers

5

Use the -T feature of tar to tell it to read the list of files from another file (tar treats each line as a separate file).

You can then use <() notation to have your shell generate a pseudo-file from the output of a command:

tar cf ctlfiles.tar -T <(find /home/db -name "*.ctl")

If your shell does not support <() notation, you can use a temporary file:

find /home/db -name "*.ctl" > ctlfile-list
tar cf ctlfiles.tar -T ctlfile-list
rm ctlfile-list

R Samuel Klatchko

Posted 2010-06-01T23:07:47.373

Reputation: 416

This fails if a filename contains a newline. Rare but possible. – None – 2010-06-01T23:38:18.580

@davisre - good point – R Samuel Klatchko – 2010-06-01T23:42:56.453

this is the right answer, it tells one instance of tar to read the filenames needed to get the content from stdin.

@davisre: i tried to create a filename in zsh with ctrl-v-enter, this creates a \r and this solution still works. maybe with a \n it does not work but i think that is pretty rare to have such filenames, it is more probable to have unicode chars in the filenames than a \n. – akira – 2010-06-02T12:06:05.737

You can fix the newline problem with --null; see David Bartlett’s answer. (This also works with arbitrary Unicode characters, as long as they use an extended ASCII encoding like UTF-8 or ISO-8859-x.)

– wchargin – 2019-07-12T20:14:41.047

4

You can use the -print0 feature of find with the -0 feature of xargs, like this:

find /home/db -name '*.ctl' -print0 | xargs -0 tar -cf ctlfiles.tar

-print0 (that's hyphen-print-zero) tells find to use a null as the delimiter between paths instead of spaces, and -0 (that's hyphen zero) tells xargs to expect the same.

Edited to add:

If you have a large number of files, xargs may invoke tar more than once. See comments for ways to deal with that, or make find invoke tar directly, like this, which works with any number of files, even if they have spaces or newlines in their names:

rm -f ctlfiles.tar
find /home/db -name '*.ctl' -exec tar -rf ctlfiles.tar {} +

Rob Davis

Posted 2010-06-01T23:07:47.373

Reputation: 149

That will fail if xargs decides that it has too many arguments are wants to run the command multiple times. – R Samuel Klatchko – 2010-06-01T23:18:40.927

True, but I believe adding the -r option to tar should fix that. – None – 2010-06-01T23:21:45.580

In fact if you use -r, you could probably dispense with xargs and let find invoke tar directly, like this:

find /home/db -name '*.ctl' -exec tar -rf ctlfiles.tar {} ;

Although in practice, the original answer would likely work -- if the number of files isn't in the thousands -- and would invoke tar once instead of N times. – None – 2010-06-01T23:24:33.003

You could also set the number with xargs -0 -n 1000000000000 and then as long as you have less than a trillion items, you'll be set. – Sophie Alpert – 2010-06-01T23:37:23.310

the first part of the answer does not work because you pipe "names" as content to tar and not "filenames which tar has to put into the tarball". the xargs thing and executing tar for each found file (as suggested in the 2nd part) is unneeded. – akira – 2010-06-02T12:00:28.353

“since it’ll invoke tar once for each file”—just change {} \; to {} +. – wchargin – 2019-07-12T20:16:15.033

Thanks, wchargin. Updated. – Rob Davis – 2019-07-15T19:41:33.950

2

When the argument following "-T" is "-", the list of files is taken from stdin. Recent versions of tar typically support the "-null" option, which indicates that the files given in the source specified by the "-T" option are null-separated.

Hence the following works with an arbitrary number of files, possibly containing newline characters:

find /home/db -name '*.ctl' -print0 | tar --null -T - -cf ctlfiles.tar

David Bartlett

Posted 2010-06-01T23:07:47.373

Reputation: 21

+1; this is the only safe and portable answer. – wchargin – 2019-07-12T20:13:05.557