gnu sort/uniq: sorting by number of times

1

How can I use GNU sort and uniq to have the most common occurrences on top instead of numerical or alphanumerical sorting? Example list.txt:

1
2
2
2
3
3

Since '2' occurs 3 times, should be on top, followed by '3' and '1' like this:

$ cat list.txt | "some sort/uniq magic combo"
2
3
1

719016

Posted 2012-01-24T15:58:11.757

Reputation: 2 899

Answers

4

Like this:

cat list.txt | sort | uniq -c | sort -rn

The -c includes the count of each unique line and then you sort by that.

If you want to remove the count after sorting, do so:

cat list.txt | sort | uniq -c | sort -rn | awk '{ print $2; }'

Doug Harris

Posted 2012-01-24T15:58:11.757

Reputation: 23 578

I've been doing this for ages, and for moderate size tasks it works well. However every so often I find myself with gigabytes of log data to go through and doing a sort on that requires a lot of disk space that is for duplicate lines that you throw away in the next step. There are better algorithms, but I don't know good simple command line tools for solving this problem at a larger scale. – mc0e – 2015-02-18T15:48:51.193