Maintain a hashcode- (checksum-) file differentially

-1

Assume you have a file hierarchy with like a million files (could be a backup).

As described in https://askubuntu.com/questions/318530/generate-md5-checksum-for-all-files-in-a-directory one could use any of the below commands to create a "checklist.chk" file with hashcode and name on each row:

md5sum * > checklist.chk        # Doesn't go down sub directories
# or
find -type f -exec md5sum "{}" + > checklist.chk   # Do go down sub directories

Then to check the files you can use:

md5sum -c checklist.chk

Now assume you have only changed a few of those million files (perhaps because you used rsync). Then it seams unnecessary to recalculate all the hashcodes.

I looking for something (a program, script or whatever) that uses a "checklist.chk" file with four columns: hashcode, modification date, size and name on each row. And much like rsync it skips files where the size/modification date hasn't changed.

Then at a later time you should of course actually check the integrity of the files by calling something corresponding to '''md5sum -c checklist.chk'''.

Or are there better ways to solve this whole problem.

Magnus Andersson

Posted 2016-09-27T09:04:38.630

Reputation: 1

You are asking an off-topic question (software shopping). Questions seeking product, service, or learning material recommendations are off-topic. See On Topic. Try https://softwarerecs.stackexchange.com/ but please first read What is required for a question to contain "enough information".

– DavidPostill – 2016-09-27T09:10:44.157

It's hard to understand what you don't just use rsync. – David Schwartz – 2016-09-27T09:19:32.097

@DavidSchwartz If you use the --checksum in rsync it calculates a checksum for each file which becomes like 50 times slower in my experience. – Magnus Andersson – 2016-09-27T12:17:59.483

@DavidSchwartz Also the sender side may also become corrupt. What I want is to be able to check the integrity of both the sender and receiver side. – Magnus Andersson – 2016-09-27T12:38:08.750

Answers

0

So I wrote my own program: https://github.com/emandersson/hashcodefilesync that does the above (speeds up the updating of the hashfile).

Magnus Andersson

Posted 2016-09-27T09:04:38.630

Reputation: 1