Sorting files by "line content" frequency

1

Imagine that there are 3 text files.

1.txt:

a
b
c

2.txt:

f
c
d

3.txt:

b
c
f

How do I sort them by frequency of each "line content"? (In case of collisions alphabetically)

Result:

c
b
f
a
d

Samuel Shifterovich

Posted 2016-07-06T23:08:12.780

Reputation: 241

Answers

4

You can use sort and uniq to sort the lines by frequencies.

sort *.txt | uniq -c | sort -k1,1nr -k2 | sed 's/^ *[0-9]* //'

The second sort uses the secondary -k2 to sort the lines of the same frequency alphabetically. The final sed just removes the frequencies.

choroba

Posted 2016-07-06T23:08:12.780

Reputation: 14 741

Didn't test yet, but gonna accept and upvote for that alphabetical part included. Thanks. – Samuel Shifterovich – 2016-07-06T23:26:08.847

1No worries, I've tested it before posting :-) – choroba – 2016-07-06T23:32:18.043

1

You can sort in descending order of frequency using sort and uniq:

$ sort *.txt | uniq -c | sort -rn
      3 c
      2 f
      2 b
      1 d
      1 a

If you want to remove the count:

$ sort *.txt | uniq -c | sort -rn | sed 's/[[:space:]]*[[:digit:]]*[[:space:]]//'
c
f
b
d
a

Note that two calls to sort are required. The first is because uniq -c requires sorted input. The second is needed to sort the lines into descending numerical order by count (frequency).

John1024

Posted 2016-07-06T23:08:12.780

Reputation: 13 893