locate similarly named files

I'm running OS X 10.7.3. I would like to locate all the files that have common strings in their filenames.

I have a folder that contains several files whose filenames are the same except for the prefix. For example:

003 - Solar Eclipse.mp3
008 - Beautiful Day.mp3
027 - Solar Eclipse.mp3
103 - Rolling Hills.mp3
244 - Rolling Hills.mp3

From that list I would like to filter out any entities with "fully unique" names, in this case: 008 - Beautiful Day.mp3

Is there a GUI client, an automator script or a terminal command that will do that?

Thanks a lot!

gooogalizer

Posted 2012-05-28T23:26:07.870

Reputation: 11

Have you tried locate? – David Schwartz – 2012-05-29T00:37:58.087

Have you tried duplicate file finding softwares? They usually use file hash comparison to detect same files with different names.(I am assuming that the similarly named files are actually duplicates.) – tumchaaditya – 2012-05-29T03:40:10.073

Answers

ls | cut -b 6- | sort | uniq -c | sort -r

this cuts the prefixes and shows how many times the file is "duplicated"

jet

Posted 2012-05-28T23:26:07.870

Reputation: 2 675

this okay but will work only if track number is 3 digit.. – tumchaaditya – 2012-05-29T03:40:52.647

another option is to sort by the size field, expecting the same files to have the same sizes – jet – 2012-06-06T21:42:17.970

This will print a list of the filenames without the prefix:

awk -F ' - ' '{counts[$2]++; names[$0]} END {for (item in counts) {if (counts[item] > 1) {print item}}}' < <(printf '%s\n' *)

Example output:

Solar Eclipse.mp3
Rolling Hills.mp3

To print the full filename of each file:

awk -F ' - ' '{counts[$2]++; names[$0]} END {for (name in names) {split(name, parts, / - /); if (counts[parts[2]] > 1) {print name}}}' < <(printf '%s\n' *)

Example output:

027 - Solar Eclipse.mp3
003 - Solar Eclipse.mp3
244 - Rolling Hills.mp3
103 - Rolling Hills.mp3

The order of the files in the output isn't guaranteed to be grouped (even though it is in this simple example. If you have GNU AWK (gawk) you can group the output:

awk -F ' - ' '
    {
        counts[$2]++;
        names[++c] = $2 " - " $1
    }
    END {
        num = asort(names);
        for (i = 1; i <= num; i++) {
            split(names[i], indices, / - /)
            if (counts[indices[1]] > 1) {
                print indices[2] " - " indices[1]
            }
        }
    }
' < <(printf '%s\n' *)

If you don't have gawk, you can use sort:

awk ... | sort -k3,3

Instead of printf using process substitution, you can pipe it into the AWK script. Or you can use find either in a pipe or using process substitution if you want to do this recursively. If you want a recursive run to compare filenames globally, you would need to strip the directory names that find outputs by default.

Paused until further notice.

Posted 2012-05-28T23:26:07.870

Reputation: 86 075