Is there a copy-and-verify command in Ubuntu/Linux?

24

5

I back up all my digital photos to a couple of places. I've been using the cp command, but--given the personal value--have started to wonder if there's a more reliable way. I'm no stranger to Linux, Bash, Perl, etc., so I could write something to copy and compare md5 hashes, but I was wondering if something already exists (reinvention, wheels and what-not).

Most of my googling for copy and (verify|valid|check|hash|confirm) turns up rsync. However, as far as I can tell, rsync only uses hashes to see if a file needs to be updated. It doesn't perform a hash comparison afterward.

For this use, specifically, the files are binary and typically 8-10MB. Any recommendations for utilities or guidance for DIY solutions would be greatly appreciated.

nshew

Posted 2010-12-05T02:55:14.120

Reputation: 386

How about unison? It is used for two-way synchronisation but it surely checks the checksum of a file.

– taper – 2019-04-14T10:32:41.977

Answers

20

From man rsync, under -c option:

-c, --checksum: skip based on checksum, not mod-time & size

Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is gener‐ ated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check.

icyrock.com

Posted 2010-12-05T02:55:14.120

Reputation: 4 623

8

Some people figured out that rsync manual is misleading about the default post copy check: http://unix.stackexchange.com/a/66702/148560 There seems to be no such check. In order to verify all copies you have to do another rsync with --checksum option, they say.

– Rotareti – 2016-01-28T19:26:29.033

5

Several years ago I had the same demands as you do. The solution I chose was to use ZFS via the ZFS-FUSE driver on my storage server. My thinking was that my personal photos, scanned documents, and other similar files were things that I may access only occasionally, so it may be a very long time, say a year or more, before I notice that a file has been corrupted due to a drive error or the like.

By that time, all of the backup copies I have may be this bit-rotted version of the file(s).

ZFS has a benefit over RAID-5 in that it can detect and repair errors in the data stored on the individual discs, even if the drives do not report a read error while reading the data. It will detect, via checksums, that one of the discs returned corrupted information and will use the redundancy data to repair that disc.

Because of the way the checksumming in ZFS is designed, I felt that I could rely on it to store infrequently used data for long periods of time. Every week I run a "zpool scrub" which goes through and re-reads all the data and verifies checksums.

ZFS-FUSE has performed quite well for me over the last few years.

In the distant past, for a client, I implemented a database system that stored checksum information on all files stored under a particular directory. I then had another script that would run periodically and check the file against the checksum stored in the database. With that we could quickly detect a corrupted file and restore from backups. We were basically implementing the same sorts of checks that ZFS does internally.

Sean Reifschneider

Posted 2010-12-05T02:55:14.120

Reputation: 1 387

1I think the OP's concern was data corruption in transit. You copy a file and the copy ends up being different to the original. – Jon Bentley – 2017-04-28T17:30:13.770

btrfs? that has checksums and is native... – Dmitry Kudriavtsev – 2017-09-22T02:31:10.223

Why the down-vote? Since no comment was left I'll assume it's a "-1, disagree". :-) – Sean Reifschneider – 2010-12-05T09:43:42.273

...but then: what part is disagreed on? Though maybe a bit off-topic for the question, this sounds solid to me. So I hope the downvote was for "not answer to the question" rather than leaving us oblivious about some real flaw in the above... – Arjan – 2010-12-05T14:27:19.167

1I realized this morning that I was assuming that icyrock was asking because of worries about bit-rot, which is what my concern was. But maybe it is somehow different. Though I can't imagine what the use case would be that would change the file contents legitimately without changing the file times. – Sean Reifschneider – 2010-12-05T20:28:36.017

2

https://sourceforge.net/projects/crcsum/ It extends Linux cp & mv with checksum verification

Hans

Posted 2010-12-05T02:55:14.120

Reputation: 21

1Please answer the question with more than just one sentence. – Kevin Panko – 2014-10-07T18:55:06.097

1

if you are copying the file locally (as is implied by your reference to cp instead of scp etc), then just cmp the source and destination files...but realistically, if cp isn't emitting some sort of error (either on the command line or in the execution return value), there isn't any reason to believe it isn't working.

if you indeed want legitimately redundant backup, consider a remote solution like dropbox.

Brad Clawsie

Posted 2010-12-05T02:55:14.120

Reputation: 270

+1. It's not clear why this answer was downvoted as it provides (what appears to me) to be a perfectly valid solution to the problem, albeit it requires two commands rather than one. – Jon Bentley – 2017-04-28T17:32:40.170

You'll really need to write a script with a loop to use this answer, as it doesn't take multiple files and folders into account. – Gringo Suave – 2019-03-22T19:59:13.600

1

I found this utility (Linux and Windows) that does just what you want (hashed copy+hashed verification with log): http://sourceforge.net/projects/quickhash/

The only downside being that it only exists as a GUI (no command line access)

Since v1.5.0, a selected source folder can be hashed, then copied & reconstructed to a destination folder where the content is again hashed for verification. Since 1.5.5, selected file masks can be used, too (*.doc; *.xls etc).

2072

Posted 2010-12-05T02:55:14.120

Reputation: 456