3

I am trying to backup my MongoDB database and check for changes with a SHA-1 checksum. The problem is that the checksum is not the same.

$ mongodump --quiet --db backup --out .
$ tar -cf backup1.tar backup
$ rm -r backup
$ sha1sum backup1.tar
d9519a5183fb797639af583738e292527c667420  backup1.tar

$ mongodump --quiet --db backup --out .
$ tar -cf backup2.tar backup
$ rm -r backup
$ sha1sum backup2.tar
f5c9e3e99e857a88052e9121a9eca61c40909c07  backup2.tar

I am sure the database was not updated:

$ mongodump --quiet --db backup --out b1
$ mongodump --quiet --db backup --out b2
$ diff -r b1 b2
Stennie
  • 1,250
  • 7
  • 12
Dawei67
  • 63
  • 4

1 Answers1

6

This issue isn't specific to MongoDB. Like most programs for file archival, the tar format stores metadata about the archived files including timestamps. If you take a database backup at two different times, the content of the backup will be identical but the metadata will not.

To check for changes in the actual data files you should instead compute checksums before tarring and include this in the archive:

 mongodump --quiet --db backup --out .
 sha1sum backup/* > backup/sha1.txt
 tar -cf backup.tar backup

You can then diff checksum files to determine if two backups have identical data:

$ diff -q backup/sha1.txt backup2/sha1.txt
Files backup/sha1.txt and backup2/sha1.txt differ

With individual checksums you can also see exactly which files changed:

$ diff backup/sha1.txt backup2/sha1.txt
3,4c3,4
< b8e37a70f4dd7a8265a9e030edec1251224957dc  backup/bacon.bson
< 9fabdb53acb5d3261fa973325c52abdd5cade6ff  backup/bacon.metadata.json
---
> 96d6e9de8885e3f24a98148f8b8630b843882c4e  backup/bacon.bson
> a3cd2cfe5b088c2033eb5e292fcbf8b39be65727  backup/bacon.metadata.json
Stennie
  • 1,250
  • 7
  • 12