gnu parallel remove escape before space characters in command

1

I'm currently testing gnu parallel to distribute a compare command across multiple servers using bash. In its most basic function this compare command takes two inputs to compare (oracle database accessions) and requires an output filename via -o. At least one action load, save, or direct upload is required by the program.

compare -o cmp.input1.input2.dat Input1 Input2

I have a few thousand of these input pairs and create a file with all the combinations so that each line contains the output filename and the database identifiers required by the program

#test_parallel
-o cmp.input1.input2.dat Input1 Input2
-o cmp.input1.input3.dat Input1 Input3
-o cmp.input2.input3.dat Input2 Input3
[...]

and execute the command using parallel, however the compare command fails

parallel -a test_parallel "compare {}"
ERROR: No action specified for results (load, save or direct upload)
usage: compare [-u][-o <file>] query target

using --dryrun mode this is what parallel executes:

compare -o\ cmp.input1.input2.dat\ Input1\ Input2

For some reason I don't understand, the escaped white space is not handled correctly by the compare program. Executing this command in bash results in the exact same error message message. Removing the escape after the -o flag (I could move the -o to the parallel command) results in a "too many arguments" error. Removing all escapes executes the command as expected.

Is it possible to tell parallel to not print the escape on the command call? I don't seem to find anything in the documentation, except that this is the expected default behavior, as indicated by parallel --shellquote

Carambakaracho

Posted 2016-10-17T08:55:55.523

Reputation: 43

Answers

4

GNU Parallel treats input as a single argument and quotes it so you can safely use filenames like:

My brother's 12" records costs 30$ each.txt

In your case you want the argument to be parsed by the shell, so the spaces will be unquoted:

parallel -a test_parallel eval compare {}

Or you can split on space:

parallel --colsep ' ' -a test_parallel compare {1} {2} {3} {4}

But since you want to compare all vs. all you can do it much more elegantly:

parallel cmp -o ../out/cmp.{1}.{2} {1} {2} ::: Input* ::: Input*

This will compare all Input* to all Input*. With --results you can get the outputs nicely structured in a dir:

parallel --results out/ cmp {1} {2} ::: Input* ::: Input*

But if you want to skip running cmp InputY InputX after you already ran cmd InputX InputY then you can do this:

parallel --results out/ cmp {=1' $arg[1] ge $arg[2] and $job->skip();' =} {2} ::: Input* ::: Input*

Ole Tange

Posted 2016-10-17T08:55:55.523

Reputation: 3 034

Thank you Ole, the combination with eval is just what I needed. The split on space probably won't work because the number of arguments changes depending on the input. The compare program I use does a comparison of biological sequences and reports the closest matches, a very special use case. – Carambakaracho – 2016-10-18T07:16:49.167