Using group by in ls

1

I am using cygwin on windows.

I wish to count the number of jpeg, gif and png files from a root folder.

Now I can do:

ls . -1R | grep '.jpeg' |  wc -l
ls . -1R | grep '.gif' |  wc -l
ls . -1R | grep '.png' |  wc -l

But just thought if there was a group by type syntax here I could do this with one command instead of three?

Any tips?

dublintech

Posted 2013-02-21T11:01:14.780

Reputation: 903

Answers

2

try that:

ls | awk -F '.' '{print $2}' | sort | uniq -c | sort -n

explanation:

awk splits the files at the '.' and outputs the second part.

sort is sorting all the outputs.

uniq does the grouping and count by group

the second sort sorts the groups by count

Probably there would be a way to improve it by making awk output the last part, instead of the second part. But i forgot howto do it, i'm sure the man awk can tell you.

replay

Posted 2013-02-21T11:01:14.780

Reputation: 474

1The OP only wants jpg, png, and gif. Also, you are assuming that there is only one . in the file name and no spaces. If you really want to use gawk , print $NF, not $2. – terdon – 2013-02-21T11:22:17.867

thx, $NF was what i was looking for – replay – 2013-02-21T11:23:35.543

@mauro.stettler not bad but that will include files with no extension. Also not sure what the -n is for at the end of the second sort. – dublintech – 2013-02-21T11:27:14.120

1the -n is to interpret numbers as numbers, and not as strings. otherwise a 12 is regarded lower than a 2. – replay – 2013-02-21T11:29:07.830

@mauro.stettler ah right I am using gawk on windows. sort - n doesn't work. – dublintech – 2013-02-21T11:30:03.040

about files with no extension, you could modify like this ls | grep '\.' | awk -F '.' '{print $NF}' | sort | uniq -c | sort -n – replay – 2013-02-21T11:32:01.947

1

These solutions work for bash. I am not sure if you want the number of each file type or the total.

If you want the total number of image files, try this:

ls  {*jpg,*png,*gif} | wc -l

This means ls anything ending in jpg, png or gif and pipe through wc.

If you want the number of each, do:

for n in jpg png gif; do echo -ne "$n\t"; ls *$n | wc -l; done

This is a for loop. It will be run 3 times, once for each of jpg png and gif. Each time the loop is run, the $n variable will take one of the extensions as a value. So, for the first run, ls *$n will be expanded to ls *jpg. echo essentially just means "print". echo -ne means print without a new line (-n) and allowing escape characters -e, this lets me use the tab character \t.

This will work fine as long as you have at least one file of each type, otherwise it will complain (it will still work, it will just complain). For a slightly more robust version, try this:

for n in jpg png gif; do echo -ne "$n\t"; echo `ls *$n 2>/dev/null | wc -l ` || echo 0; done

This loop is similar to the above but checks if the ls command returns an error. The || operand in bash means "Do this or, if this did not work, do that". So, I am telling bash to ls *jpg etc and if it does not work, i.e. if there are no files with that extension, echo (print) 0. The 2>/dev/null causes any error messages to be discarded.


You can also use awk (this is a slight modification of mauro stettler's answer so it will count only the files with the extensions you are interested in):

ls {*.jpg,*.png,*.gif} | awk -F'.' '{print $NF}' | sort | unic -c 

terdon

Posted 2013-02-21T11:01:14.780

Reputation: 45 216

Thanks. That works via a bash shell. Can you explain what the characters mean? Just so I learn something as opposed to copy the answer. – dublintech – 2013-02-21T11:16:20.133

works in bash. My mistake. I have updated comment – dublintech – 2013-02-21T11:19:57.167

I've added some more info @dublintech. Is it clear now? – terdon – 2013-02-21T11:27:35.070

what do -ne, $n\t, *$n 2/dev/null, ' || mean? – dublintech – 2013-02-21T11:28:59.860

@dublintech I explain them in my answer. 2> means redirect error messages, /dev/null is a special device in *nix used for discarding things. It is just a trick that discards error messages. – terdon – 2013-02-21T11:33:19.790

1

You could also use find:

find . -name \*.jpeg -o -name \*.gif -o -name \*.png | sed 's/.*\.\([^.]\+\)/\1/' | sort | uniq -c

This returns the number of files under . with extensions jpeg, gif and png nicely formatted with one result per line:

 123 gif
 110 jpeg
1832 png

Add other extensions as needed.

Explanation of the command:

  • find . -name \*.jpeg -o -name \*.gif -o -name \*.png

    searches files with that either match '.jpeg' or '.gif' or '*.png'.

  • sed 's/.*\.\([^.]\+\)/\1/'

    removes the filename and only leaves the extension, for example, file.gif becomes gif

  • sort

    sort extensions. After this command, output looks like this:

    gif
    gif
    gif
    (...)
    jpeg
    jpeg
    (...)
    png
    (...)
    
  • uniq -c

    report number of occurrences.

jaume

Posted 2013-02-21T11:01:14.780

Reputation: 4 947