15

Getting all extensions for a directory: easy. Getting file counts for a particular extension: easy.

But getting all file extensions and their respective file counts is alluding me.

eg.

+ dir
 + abc.txt
 + def.txt
 + abc.pdf
 * def.pov

should return something like:

.txt 2
.pdf 1
.pov 1

The aim of this exercise is that I want to find out which file extension is popular in a certain directory.

Thanks in advance

denormalizer
  • 471
  • 2
  • 5
  • 15

2 Answers2

48
/var/cache$ sudo find ./ -type f | grep -E ".*\.[a-zA-Z0-9]*$" | sed -e 's/.*\(\.[a-zA-Z0-9]*\)$/\1/' | sort | uniq -c | sort -n
      1 .6
      1 .cache
      1 .noconf
      1 .php
      1 .sl
      2 .bin
      2 .el
      2 .tdb
      4 .baseA
      4 .baseB
      4 .dat
      4 .DB
     27 .db
    221 .deb

Here is the explication:

find ./ -type f

find only file, not directory

grep -E ".*\.[a-zA-Z0-9]*$"

filter file with extension

sed -e 's/.*\(\.[a-zA-Z0-9]*\)$/\1/'

delete path and file name, save only extension

sort | uniq -c | sort -n

sort, uniq and sort

Marco
  • 172
  • 10
bindbn
  • 5,153
  • 2
  • 26
  • 23
  • You could make your regex allow more characters in the extension and eliminate `grep` by doing this: `sed -ne '/\.[^./]*$/s/.*\(\.[^.]*\)$/\1/p'` – Dennis Williamson Sep 22 '10 at 06:17
  • Dennis, replacing the grep and sed with your sed returns the following error: sed: -e expression #1, char 30: invalid reference \1 on `s' command's RHS – denormalizer Sep 23 '10 at 01:09
2

Since you're using Linux (gnu grep), this is a good time to use Perl REs (PCRE) -P and grep's -o option. Taking @bindbn's answer as a great candidate:

find . -type f | grep -Po '\.([\w\d])*$' | sort | uniq -c | sort -n
Jim
  • 335
  • 1
  • 3
  • 13