32

I'm looking for a quick way to compare directory contents. Is it possible to do an md5sum (or equivalent checksum) of an entire directory?

Using Ubuntu Linux

Nicolas Kaiser
  • 165
  • 1
  • 4
  • 16
pufferfish
  • 2,660
  • 9
  • 37
  • 40
  • You may also want to look into using diff to compare directories which will actually show you where the directories differ. http://www.unixtutorial.org/2008/06/how-to-compare-directories-in-unix/ – Kibbee Aug 21 '11 at 02:38
  • @Kibbee To prevent that, you need to take into account something other than the data content of each file and exactly how you checksum the files. Given: [checksums] 1. **A (Directory)** - File1 [ABC] - File2 [CBA] 2. **B (Directory)** - File1 [ABC] - **B1 (Directory)** - File2 [CBA] 3. **C (Directory)** - File4 [ABC] - File5 [CBA] 4. **D (Directory)** - File1 copy [ABC] - File2 copy [CBA] Directory **A** and **B** are not identical although they contain the same files (although in **B1**, **File2** is in a subdirectory). Under your example, **A** and **C** would be considered identical because – Jacob Lyles Oct 19 '11 at 11:06

10 Answers10

39

Sure - md5sum directory/*

If you need something a little more flexible (say, for directory recursion or hash comparison), try md5deep.

apt-get install md5deep
md5deep -r directory

To compare a directory structure, you can give it a list of hashes to compare against:

md5deep -r -s /directory1 > dir1hashes
md5deep -r -X dir1hashes /directory2

This will output all of the files in directory2 that do not match to directory1.

This will not show files that have been removed from directory1 or files that have been added to directory2.

Argyle
  • 43
  • 4
Shane Madden
  • 112,982
  • 12
  • 174
  • 248
  • 1
    Not what I meant, but what I wanted :) I did mean recursively, and getting ONE hash at the end, but I think this can be done with md5deep -l and hashing the output itself. – pufferfish Aug 20 '11 at 17:25
  • 1
    The order of the hashing is not consistent, so would have to sort the output before hashing – pufferfish Aug 24 '11 at 12:13
  • 2
    To get a deterministic order, use `-j0` which disables multithreading (see the man page). – Johann Mar 29 '14 at 19:02
  • 1
    @ShaneMadden♦ I installed `md5deep` with `sudo apt-get install md5deep` on `Ubuntu 16.04` but when I tried to read the man page it tells me that > No manual entry for md5deep – Kasun Siyambalapitiya Jul 24 '17 at 09:24
29

If you'd like to see what's different (if anything) between two directories, rsync would be a good fit.

rsync --archive --dry-run --checksum --verbose /source/directory/ /destination/directory

This will list any files that are different.

JakePaulus
  • 2,347
  • 16
  • 17
  • 2
    `diff -qr /source/directory/ /destination/directory/` would also show files that differ. – Konerak Aug 21 '11 at 14:36
  • 1
    Is there a way to perform a bitwise comparison instead of checksums? It might be faster on local drives. – Ali Aug 21 '11 at 15:03
  • Very nice. Works if source or destination are also remote folder e.g. `username@hostname:/destination/directory` – Thalis K. Jan 08 '17 at 09:41
13

i think i answered this one before with this answer:

find . -xtype f -print0 | xargs -0 sha1sum | cut -b-40 | sort | sha1sum

gives: b1a5b654afee985d5daccd42d41e19b2877d66b1

the idea is you hash all the files cut out the hashes one per line, sort them and hash that yielding a single hash. this doesn't depend on the names of the files.

Dan D.
  • 241
  • 3
  • 7
5

The cfv application is quite useful, not only it can check and create MD5 checksums, it can also do CRC32, sha1, torrent, par, par2.

to create a CRC32 checksum file for all files in current directory:

cfv -C

to create a MD5 checksum file for all files in current directory:

cfv -C -t md5 -f "current directory.md5sums"

To create a separate checksum file for each sub directory:

cfv -C -r

To create a "super" checksum file containing files in all sub directories:

cfv -C -rr
Hubert Kario
  • 6,351
  • 6
  • 33
  • 65
4

I used hashdeep, as explained in this askubuntu answer: Check the correctness of copied files:

To calculate the checksums:

 $ cd <directory1>
 $ hashdeep -rlc md5 . > ~/hashOutput.txt

To verify and list the differences:

 $ cd <directory2>
 $ hashdeep -ravvl -k ~/hashOutput.txt .
 hashdeep: Audit passed
    Input files examined: 0
   Known files expecting: 0
           Files matched: 13770
 Files partially matched: 0
             Files moved: 0
         New files found: 0
   Known files not found: 0

This has an advantage over md5deep in that it will show renamed (moved), added, and removed files, as well as avoiding the problem with 0 length files pointed out at the bottom of http://www.meridiandiscovery.com/how-to/validating-copy-results-using-md5deep.

Paul Gear
  • 3,938
  • 15
  • 36
Argyle
  • 43
  • 4
3

This worked for me: (run it while in the directory you are interested in)

md5deep -rl . | awk '{print $1}' | sort -n | md5sum
cat pants
  • 2,139
  • 10
  • 33
  • 44
1

You could create MD5 sums of every single files, order these checksums alphabetically and has them (with or without newlines). Since MD5 is cryptographic, it should work just fine with hashes of hashes.

There should be a certain order to things, otherwise you will get different results for equal dirs.

And you should consider that adding some file to one dir will completely change the result, even if it was just a .directory of .DS_Store file.

Martin Ueding
  • 237
  • 2
  • 8
  • Technically one could get the same hash for different directories. If dir A had 2 files with contents 'ab' and 'c' and dir B had 2 files with contents 'a' and 'bc' then hashing only the data in the files would yield the same results, even though they have files with different contents. I'm not even sure how one would define the MD5Sum of a directory. – Kibbee Aug 21 '11 at 02:37
1

As a specific case, lets say you want to copy some files from directory1 to directory2 and then you want to verify a successful copy using an md5 comparison.

First. cd to directory1 and type:

find -type f -exec md5sum "{}" \; > ~/Desktop/md5sum.txt

which will create a reference file containing an md5 sum for each file in directory1. Once this is done, all you have to do is cd to directory2 and type:

md5sum -c ~/Desktop/md5sum.txt

The program md5sum fetches each path from the md5sum.txt file, computes the md5sum of that file in the destination folder and then compares it with the sum it has stored in the file.

After the process is complete, you will get a summary such as 'So and so many files didn't match up' or something like that.

Joel
  • 11
  • 1
  • 1
    Reference: http://ubuntugenius.wordpress.com/2009/11/17/data-verification-of-folders-discs-with-md5-checksums-in-ubuntu/ – Joel Feb 12 '12 at 12:05
1

I've had a need for verifying integrity of backups/mirrors which contain a large number of files and ended up writing a command-line program called MassHash. It's written in Python. A GTK+ Launcher is also available. You may want to check it out...

http://code.google.com/p/masshash/

Jonathan
  • 11
  • 1
0

One-liner:

find directory -exec md5sum {} \; 2>&1 | sort -k 2 | md5sum

This lists all files and directories and gets md5sum for each. Then gets md5sum for everything.

Tricky bit solved here that md5sum is not capable to do the sum for a directory, but it tells this to us: md5sum: dir/sub_dir: Is a directory. We just move this message to a standard output.

laimison
  • 519
  • 2
  • 7
  • 16