Scan for duplicate files with different extensions

1

I am looking for duplicate file names with different file extensions.

Here is the command I run:

find -maxdepth 2 -type f \( -name "*.avi" -or -name "*.mkv" -or -name "*.mp4" -or -name "*.mpg" -or -name "*.MP4" -not -name '*~' \) | sort > ~/sorted.txt

Here is a sample output (in sorted.txt):

./Avengers- Age of Ultron (2015)/Avengers- Age of Ultron (2015).mp4
./Beetle Juice (1988)/Beetle Juice (1988).avi
./Clerks II (2006)/Clerks II (2006).avi
./Death Race (2008)/Death Race (2008)-pt1.avi
./Death Race (2008)/Death Race (2008)-pt2.avi
./Death Race 2 (2010)/Death Race 2 (2010).mp4
./Into the Wild (2007)/Into the Wild (2007).avi
./Into the Woods (2014)/Into the Woods (2014).mkv
./Into the Woods (2014)/Into the Woods (2014).mp4
./Pink Floyd  The Wall (1982)/Pink Floyd  The Wall (1982).avi
./The Big Lebowski (1998)/The Big Lebowski (1998).avi
./The Gods Must Be Crazy (1980)/The Gods Must Be Crazy (1980).avi
./The NeverEnding Story (1984)/The NeverEnding Story (1984).avi
./The NeverEnding Story (1984)/The NeverEnding Story (1984).mpg
./Winnie the Pooh (2002)/Winnie the Pooh (2002).avi

I want to trim the output written to sorted.txt to this:

./Into the Woods (2014)/Into the Woods (2014).mkv
./Into the Woods (2014)/Into the Woods (2014).mp4 
./The NeverEnding Story (1984)/The NeverEnding Story (1984).avi
./The NeverEnding Story (1984)/The NeverEnding Story (1984).mpg

i.e., the titles that I have more than one copy of.  Even better would be like this, but the above will do:

Into the Woods (2014)
The NeverEnding Story (1984)

Note: there will be spaces, dashes (-), apostrophes (') and parentheses (( and )) within file names (but no commas (,), double quotes ("), or underscores (_) in file names).  Also, the final output is for eyes to read, so it does not have to be pretty.  I just need to be able to manually identify duplicates quickly.

jasenmichael

Posted 2015-11-07T05:04:42.093

Reputation: 11

Wow; command |  cat  | command — that cat is spectacularly useless.

– Scott – 2015-11-07T09:47:52.583

Answers

2

Try this:

rev < sorted.txt | cut -d . -f 2- | cut -d / -f 1 | rev | uniq -d

Output:

Into the Woods (2014)
The NeverEnding Story (1984)

Cyrus

Posted 2015-11-07T05:04:42.093

Reputation: 4 356

As you should know by now, Super User prefers answers that include explanations of how and why the commands work. – Scott – 2015-11-07T09:23:51.013