1
I have an awk "script" which sums column 3, then 4, for each value in column 1 and when column 2 > 0:
awk 'BEGIN { print "Target covered_bases percentage_covered" } {FS = "\t" } $2 > 0 { n[$1]++; covered_bases[$1] += $3 ;percentage_covered[$1] += $4 } END { for (i in n) { print i,covered_bases[i],percentage_covered[i] } }' $1
My infile would be like this:
S 0 20 0.2
S 1 300 0.7
S 2 10 0.1
D 0 10 0.3
D 1 20 0.6
D 2 2 0.02
D 3 5 0.034
And so on, to let's say Z. The output here would be:
Target covered_bases percentage_covered
S 310 0.8
D 27 0.654
So this is fine. However, the letters are output in the wrong order. I know from other questions here that awk sometimes output things not in order. My problem is I cannot seem to correct this using previous answers given in this forum as my understanding of awk is not great at all and my "script" is already quite complicated to my mind.
Could you let me know how I can correct it?
Many thanks!
For recent (v4) GNU awk only, you can set
PROCINFO["sorted_in"]="@ind_str_asc";
before thefor(i in n)
. For any other awk use external sort as answered by Alex. Or consider usingperl
in its awk-ish-lna
mode instead. – dave_thompson_085 – 2017-01-13T17:50:48.220Thanks for this! It works in a similar way as what was suggested by Alex. However, again, column 1 is not in alphabetical or numerical order. I edited my question. – Agathe – 2017-01-13T18:01:52.050
May be i didn't get something but according to math order of values that need to be summarized doesn't effect result . Sorting happened after sum calculated. Could you clarify it please how order of first column may effect calculation – Alex – 2017-01-13T18:22:42.633
Are you saying you want the letters (column 1) in the SAME ORDER AS THE INPUT not their natural order which is A B C D E F etc? If so, are all lines for a letter consecutive? If not, let's say the input lines are A B A D C B A D C B A -- what is the correct order of the output and why? If they are consecutive you don't need accumulate-array(s)-anywhere-in-file logic you need accumulate-scalar(s)-while-group logic. – dave_thompson_085 – 2017-01-14T00:52:03.187
Sorry it is not clear yet. To answer dave, indeed, I would like the output to be in the same order as the input (not their natural order). Lines for a letter are consecutive. I am not sure what the "accumulate-scalar(s)-while_group" logic is, but I sure can try and find that. Thanks. – Agathe – 2017-01-18T17:25:05.720