Finding a file by md5sum

2

1

Given the md5sum of a file, I want to know if anywhere else in the directory tree is another file with the same md5sum (but maybe under a different name). How can I do that in bash?

P.S.: To emphasize, this should work for the entire tree below a given directory, i.e. must work recursively not just in the current directory.

user50105

Posted 2013-10-02T05:20:01.343

Reputation:

Answers

1

Using find to recursively test all files:

find . -type f -exec \
bash -c 'md5sum "$0" | grep -q 2690d194b68463c5a6dd53d32ba573c7 && echo $0' {} \;

Here, md5sum outputs the MD5 sum and the file name. You need to grep it for the actual MD5 sum as there is no switch to have it just output the sum alone.

You can check the MD5 sum much easier with md5 if you're on BSD or OS X:

find . -type f -exec \
bash -c '[ "$(md5 -q "$0")" = 2690d194b68463c5a6dd53d32ba573c7 ] && echo $0' {} \;

slhck

Posted 2013-10-02T05:20:01.343

Reputation: 182 472

Thanks slhck looks very interesting, but apparently my md5 command has no -q option; neither has my md5sum command. I am using Xubuntu. Also what is the {} \ for? – None – 2013-10-02T06:06:22.987

Sorry, I had the wrong BSD md5 tool here. On Linux you need md5sum. I'll correct my post when I'm back on a computer. The {} is the file path for each file found. It gets passed to sh. The ; simply ends the exec call. – slhck – 2013-10-02T06:34:38.787

1

The other solutions are good but I want to propose one with fewer spawned processes, which should be significantly faster for many small files, if you have GNU find:

find /path/to/tree -type f -exec md5sum \{\} + | sed -nre 's/^md5-to-search-for  //p'

or without GNU find:

find /path/to/tree -type f -print0 | xargs -r -0 -- md5sum | sed -nre 's/^md5-to-search-for  //p'

David Foerster

Posted 2013-10-02T05:20:01.343

Reputation: 829

0

Borrowing some of the solution from slhck, I've came up with

find . -type f -print0 | while read -r -d '' f;
do
 md5sum "$f" | grep "$1"
done

Where $1 is the first argument. If you want to check for a missing argument start the file with:

if [ -z "$1" ]
  then
    echo "No argument supplied"
    exit
fi

tbrixen

Posted 2013-10-02T05:20:01.343

Reputation: 121

This breaks if files contain whitespace in their path. To iterate over files you should use find with exec or globbing (e.g. **) – slhck – 2013-10-02T06:31:03.053

Another option would be to use -print0 in your find command, and xargs -0. So, in this case, find . -type f -print0 | xargs -0 md5 | grep (your MD5 code) – Kent – 2013-10-02T06:36:53.380

I like this solution a lot. However it gives me tons of messages on stderr: "md5sum: somefilename: No such file or directory". I wonder if there's a way to suppress that? – None – 2013-10-03T05:50:50.813

1

@gojira If you get a "no such file or directory" that is probably because the files contain whitespace, and brixenDK's command breaks on this (due to the for f in …). If one file was named foo bar, for example, it'd try to do an MD5 sum on foo and bar, which both don't exist. For a good explanation why this happens and how to avoid it, see: http://mywiki.wooledge.org/ParsingLs

– slhck – 2013-10-03T06:54:26.540