This will print a list of the filenames without the prefix:
awk -F ' - ' '{counts[$2]++; names[$0]} END {for (item in counts) {if (counts[item] > 1) {print item}}}' < <(printf '%s\n' *)
Example output:
Solar Eclipse.mp3
Rolling Hills.mp3
To print the full filename of each file:
awk -F ' - ' '{counts[$2]++; names[$0]} END {for (name in names) {split(name, parts, / - /); if (counts[parts[2]] > 1) {print name}}}' < <(printf '%s\n' *)
Example output:
027 - Solar Eclipse.mp3
003 - Solar Eclipse.mp3
244 - Rolling Hills.mp3
103 - Rolling Hills.mp3
The order of the files in the output isn't guaranteed to be grouped (even though it is in this simple example. If you have GNU AWK (gawk
) you can group the output:
awk -F ' - ' '
{
counts[$2]++;
names[++c] = $2 " - " $1
}
END {
num = asort(names);
for (i = 1; i <= num; i++) {
split(names[i], indices, / - /)
if (counts[indices[1]] > 1) {
print indices[2] " - " indices[1]
}
}
}
' < <(printf '%s\n' *)
If you don't have gawk
, you can use sort
:
awk ... | sort -k3,3
Instead of printf
using process substitution, you can pipe it into the AWK script. Or you can use find
either in a pipe or using process substitution if you want to do this recursively. If you want a recursive run to compare filenames globally, you would need to strip the directory names that find
outputs by default.
Have you tried
locate
? – David Schwartz – 2012-05-29T00:37:58.087Have you tried duplicate file finding softwares? They usually use file hash comparison to detect same files with different names.(I am assuming that the similarly named files are actually duplicates.) – tumchaaditya – 2012-05-29T03:40:10.073