How do I use GNU split's "filter" option with GNU parallel?

I am trying to split a number of huge gz file into N-line compressed gzipped chunks.

To demonstrate, let us consider the following:

seq 100 | gzip > big_file0.gz

I can split this into multiple 10-line compressed files as follows:

zcat big_file0.gz | split -l 10 --filter='gzip > $FILE.gz' - big_file0.

Let us assume we have a number of big files big_file0.gz, big_file1.gz ...

I would now like to split each of these files using GNU parallel. Here's the command I come up with:

parallel "zcat {} | split -l 10 --filter='gzip > $FILE.gz' - {.}." ::: big_file0.gz big_file1.gz

However, the shell substitution for $FILE does not work as expected. $FILE is replaced with an empty string, so all the output is written to a file called .gz.

How can I get the $FILE substitution to work as expected in GNU parallel?

saffsd

Posted 2012-10-23T00:43:34.837

Reputation: 133

Answers

Shell expansion of variables is converting $FILE to an empty string. You need to put a backslash in front of $FILE to prevent the shell from doing the expansion.

Kyle Jones

Posted 2012-10-23T00:43:34.837

Reputation: 5 706

Today you would use GNU Parallel's --pipe option:

parallel --seqreplace // "zcat {} | parallel --pipe -N 10 gzip '>{.}.{#}.gz'" ::: big_file0.gz big_file1.gz

If you are OK with appending big_file0.gz big_file1.gz it is even simpler:

zcat big_file0.gz big_file1.gz | parallel --pipe -N 10 gzip '>{#}.gz'

Ole Tange

Posted 2012-10-23T00:43:34.837

Reputation: 3 034