4

I just set up a FreeNAS server with a terabyte drive. I want to only have one hard drive in each machine so I have been taking as much data as possible and sending it over the LAN to FreeNAS. I have noticed at least one file didn't copy properly and is now corrupt. (I am also noticing some strange permission issues, but that is another question.) Now that most of the data is over on the FreeNAS server, is there an automated way of verifying nothing else is corrupt?

I am not exactly sure how to describe how the file was corrupt. Basically it appeared to be a 178 megabyte video file, but when accessing it to play or even move, the windows machine accessing it gave a generic could not access error message. I used FreeNAS's web copy interface to move the file, once it was moved, the file was 76 megs, and could not be played.

Bob
  • 2,917
  • 5
  • 28
  • 32

2 Answers2

10

Always run:

cd /filesystem; \
find . -type f -exec md5sum {} \; >& /filesystem-md5.log

and then

cd /filesystem-new; \
md5sum -c /filesystem-md5.log

before and after copying a large amount of data.

You'll be surprised how much random data corruption you experience in the real world.

When you find a corrupt file, cmp -l badfile goodfile to attempt to understand the nature of the corruption.

This is why I beg for end-to-end integrity checking in all cases. Unfortunately filesystem and OS vendors do not take this seriously.

chwarr
  • 105
  • 6
carlito
  • 2,489
  • 18
  • 12
  • 2
    If serverfault allows it, I think this begs the question, what is a tool which will do this for me automatically? – Bob May 26 '09 at 21:55
  • +1 for mentioning md5sum. I just found this (rather old) question and would like to encourage everyone to follow carlito's advice and verify checksums of copied files. I even wrote myself a little tool to do just that, so I don't have to worry about forgetting a file (md5sum -c MD5SUMS obviously won't complain if there is an additional file because you forgot to put it in the MD5SUMS file). It takes some time to calculcate/verify the hashes, but it's worth it, because it basically shows you a list of all the files that have been corrupted. – basic6 Jun 27 '13 at 09:13
1

You can check Aide. I guess there's other integrity tools out there.

It creates a database from the regular expression rules that it finds from the config file. Once this database is initialized it can be used to verify the integrity of the files. It has several message digest algorithms (md5,sha1,rmd160,tiger,haval,etc.) that are used to check the integrity of the file. More algorithms can be added with relative ease. All of the usual file attributes can also be checked for inconsistencies. It can read databases from older or newer versions. See the manual pages within the distribution for further info. There is also a beginning of a manual.

bbigras
  • 276
  • 1
  • 7