7
3
I'm trying to find all duplicate files (based on MD5 hash) and ordered by file size. So far I have this:
find . -type f -print0 | xargs -0 -I "{}" sh -c 'md5sum "{}" | cut -f1 -d " " | tr "\n" " "; du -h "{}"' | sort -h -k2 -r | uniq -w32 --all-repeated=separate
The output of this is:
1832348bb0c3b0b8a637a3eaf13d9f22 4.0K ./picture.sh
1832348bb0c3b0b8a637a3eaf13d9f22 4.0K ./picture2.sh
1832348bb0c3b0b8a637a3eaf13d9f22 4.0K ./picture2.s
d41d8cd98f00b204e9800998ecf8427e 0 ./test(1).log
Is this the most efficient way?
See also http://unix.stackexchange.com/a/71178/23542
– artfulrobot – 2015-05-20T14:39:28.727Ok, that's a fair point. But looking at this as a learning exercise for linux cmd, can this be improved? For instance, originally I started off with -exec 'md5sum.....' but research found (using google) xargs was more efficient. – Jamie Curran – 2012-10-14T22:00:35.560
If you want to learn new techniques, I suggest looking how these tools are solving the problem and you will get a lot of clever ideas (the source, Luke, use the source). – Paulo Scardine – 2012-10-14T22:06:42.090