Find sub-folders containing only duplicate files

1

I am looking for a method (not including paid software) for finding all folders that contain only files with that are also in at least one other sub-folder of the parent directory. If used on a music library, this would list all compilation albums.

File structure:
Artist folder
- Album folder
- - songs with filename as title

DDriggs00

Posted 2017-11-17T01:49:02.520

Reputation: 51

Any number of duplicates? How many layers deep? – jdwolf – 2017-11-17T01:58:31.970

Updated OP @jdwolf – DDriggs00 – 2017-11-17T02:29:19.723

Answers

1

You can use PowerShell!

$dupes = gi $args[0] | gci -File -Recurse | group Name | ? {$_.Count -gt 1}
gi $args[0] | gci -Directory | ? {
    $allDupes = $true
    $hasAny = $false
    $_ | gci -File | % {
        $folder = $_
        $hasAny = $true
        If (!($dupes | ? {$_.Name -eq $folder.Name})) {$allDupes = $false}
    }
    $allDupes -and $hasAny
}

This script is a little tricky, so let's go through it carefully. First, it gets the folder specified as an argument, recursively finds the files it contains, groups them by file name, takes only the groups with more than one item (i.e. the groups that represent duplicated songs), and stashes that collection of groups in $dupes. Then it again gets the specified parent directory, but then lists only the immediate subfolders. It filters them (?), letting only those containing only duplicated entries exit the pipeline and be printed to the screen.

That big filter block takes up most of the script, so let's look at it in more detail. It starts with two variables, one to keep track of whether the current album folder contains only duplicates so far, and other to note whether there actually are any songs in the folder. (I suspect it wouldn't really be helpful to count empty folders as compilation albums.) It lists the files in the album folder, then for each of them (%), makes sure there is a duplicate group containing a file with the same name, and if not (i.e. nothing comes out of the short pipeline inside the If), indicates the failure by setting $allDupes to false. If the for-each block didn't run at all, then $hasAny remains false. Finally, the big filter block evaluates whether all items in the album are duplicates and there are actually any there. The result of the expression determines whether the album folder will be included in the outer pipeline's output.

Note that some of the PowerShell features used in this script were introduced moderately recently. If you use Windows 7, the -File and -Directory switches will not work. This can be worked around if necessary.

To use the script, save it as a .ps1 file, e.g. albumdupes.ps1. If you haven't already, follow the instructions in the Enabling Scripts section of the PowerShell tag wiki. Then you can run it from a PowerShell prompt in the directory where you saved it, supplying the path to your artist folder:

.\albumdupes.ps1 'C:\Users\Ben\Test\albumtest'

You'll get output like this:

    Directory: C:\Users\Ben\Test\albumtest


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       11/20/2017   2:00 PM                Album C
d-----       11/20/2017   2:01 PM                Album F

Ben N

Posted 2017-11-17T01:49:02.520

Reputation: 32 973