Given key-value content, how do I group values by key and sort by value?

1

1

I've the following sample input format (file or stdin):

key1: 1
key2: 2
key3: 3
key1: 4
key2: 5
key3: 6

Is there any few-liner which can group values based on the key and then sort?

Expected output would be either table:

key1 key2 key3
1    2    3
4    5    6

which can be afterwards sorted by column (e.g. sort -k2). Similar to this example.

Or in similar format, like:

key1: 1 4
key2: 2 5
key3: 3 6

What's the easiest way to achieve that transformation?

kenorb

Posted 2015-04-26T10:44:21.783

Reputation: 16 795

Assuming the initial key sort (sort -k1,1 -k2) which also would result in sorted values (so you could just sort without any -k option), then the problem is simplified to merging lines with the same prefix; see http://unix.stackexchange.com/questions/47786/concatenate-lines-by-first-column-awk-or-sed

– michael – 2015-05-03T08:58:09.827

Answers

2

It seems a work for awk. It allows somehow the use of associative multidimensional arrays.
The following bash script should do the work.

awk ' { Nb[$1]++;b[$1][Nb[$1]]=$2 }       
      END{ for (i in Nb) {                
        printf("%s ", i);               
        for (j = 1 ; j<=Nb[i]; j++) printf("%ld ", b[i][j]);
       printf(" \n") ; }   
      }' aaa.txt  | sort

Some notes:

  • In the first part {...} it scans all the external file aaa.txt and load the arrays b[][] and the number of occurrences for each key Nb[](maybe you have 4 instances of key3 and 12 of key1...).

  • In the END{...} part for each key found for (i in Nb) print the key with no newline (printf(...)) then for each entry with index 1..Nb[i], it will print the value. Finally print a newline.

  • The final pipe | will sort the output

    key1: 1 4  
    key2: 2 5  
    key3: 3 6
    
  • Of course if are needed different layouts it's possible to build them starting from the example above and changing the nesting order of the for cycles.

Hastur

Posted 2015-04-26T10:44:21.783

Reputation: 15 043

0

When key list is fixed, it can be transposed by the following command:

$ pr -mt <(grep key1 in.txt | cut -f2 -d:) <(grep key2 in.txt | cut -f2 -d:) <(grep key3 in.txt | cut -f2 -d:) | sort -k1
1            2           3
4            5           6

This can be improved further more to make it more flexible by using eval.

kenorb

Posted 2015-04-26T10:44:21.783

Reputation: 16 795