Why does this not work? "ls *.txt | xargs cat > all.txt" (all files into single txt document)

20

4

Why does this not work?

ls *.txt | xargs cat > all.txt

(I want to join the contents of all text files into a single 'all.txt' file.) find with -exec should also work, but I would really like to understand the xargs syntax.

Thanks

ajo

Posted 2010-09-28T11:04:48.910

Reputation: 785

1

Though don't use ls for this. If you really can't use cat *.txt >all.txt then try printf '%s\0' *.txt | xargs -r0 cat >all and then mv all all.txt to avoid having the file referencing itself.

– tripleee – 2019-07-24T05:52:14.370

Answers

27

ls *.txt | xargs cat >> all.txt

might work a bit better, since it would append to all.txt instead of creating it again after each file.

By the way, cat *.txt >all.txt would also work. :-)

Janne Pikkarainen

Posted 2010-09-28T11:04:48.910

Reputation: 6 717

This is potentially very dangerous command. If "all.txt" already exists, running this command will expand to fill all available hard drive space. – Dan Loewenherz – 2014-08-28T15:43:20.853

6The cat *.txt >all.txt is naturally better. Thanks – ajo – 2010-09-28T11:11:53.673

1However, the ... | xargs cat >> all.txt or > all.txt always return error with xargs: unmatched single quote ... Is it because xargs takes everything after it as the command? – ajo – 2010-09-28T11:12:56.310

1Do you have filenames with spaces? If so, then use something like "find /your/path -iname '*.txt' -print0 | xargs -0 cat >>all.txt" instead – Janne Pikkarainen – 2010-09-28T11:17:58.007

1no, I replaced all the filename spaces with . But thinking of it, some filenames are likely to include single quotes as in listing_O'Connor.txt, this might be the problem! – ajo – 2010-09-28T11:29:46.270

Yes, that's the problem then. :) The easiest and the sanest way is to use find with -print0 combined with xargs -0 -- then the whole chain will use NULL character as a separator and whitespace and special characters will be taken care of automatically. – Janne Pikkarainen – 2010-09-28T11:37:05.377

Indeed: After removing the single-quotes in some filenames via "s/'/_/g" *.txt, the command works OK!! But could it be done from within xargs via some option??? – ajo – 2010-09-28T11:40:16.643

OK, find isn't bad either... I remember having similar problems when the file names contained spaces!! – ajo – 2010-09-28T11:42:27.543

find is much better than ls in case you need to recurse into subdirectories. – Janne Pikkarainen – 2010-09-28T11:44:26.080

for the latter: what about ls -R? (apart from the directory line?!) – ajo – 2010-09-28T11:51:33.563

ls -R maybe fine for human readable form, but if you need to handle something with xargs or other tools -- not so much. See, ls -R does not list the full path along with the every filename, but find or tree will do it. Makes scripting a lot easier. When scripting or piping stuff, please get rid of ls and use more advanced tools :-) – Janne Pikkarainen – 2010-09-28T11:54:20.193

3

If some of your file names contain ', " or space xargs will fail because of the separator problem

In general never run xargs without -0 as it will come back and bite you some day.

Consider using GNU Parallel instead:

ls *.txt | parallel cat > tmp/all.txt

or if you prefer:

ls *.txt | parallel cat >> tmp/all.txt

Learn more about GNU Parallel http://www.youtube.com/watch?v=OpaiGYxkSuQ

Ole Tange

Posted 2010-09-28T11:04:48.910

Reputation: 416

1

all.txt is a file in the same directory, so cat gets confused when it wants to write from the same file to the same file.

On the other hand:

ls *.txt | xargs cat > tmp/all.txt

This will read from textfiles in your current directory into the all.txt in a subdirectory (not included with *.txt).

Jeremy Smyth

Posted 2010-09-28T11:04:48.910

Reputation: 354

Still the following error: xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option – ajo – 2010-09-28T11:29:15.493

1Do you have a .txt file with a singlequote in its name? – Jeremy Smyth – 2010-09-28T12:53:26.283

0

You could also come across a command line length limitation. Part of the reason for using xargs is that it splits up the input into safe command-line-sized chunks. So, imagine a situation in which you have hundreds of thousands of .txt files in the directory. ls *.txt will fail. You would need to do

ls | grep .txt$ |xargs cat > /some/other/path/all.txt

.txt$ in this case is a regular expression matching everything that ends in .txt (so it's not exactly like *.txt, since if you have a file called atxt, then *.txt would not match it, but the regular expression would.)

The use of another path is because, as other answers have pointed out, all.txt is matched by the pattern *.txt so there would be a conflict between input and output.

Note that if you have any files with ' in their names (and this may be the cause of the unmatched single quote error), you would want to do

ls | grep --null .txt$ | xargs -0 cat > /some/other/path/all.txt

The --null option tells grep to use output separated by a \0 (aka null) character instead of the default newline, and the -0 option to `xargs tells it to expect its input in the same format. This would work even if you had file names with newlines in them.

Brian Minton

Posted 2010-09-28T11:04:48.910

Reputation: 410