CRC Checksum missmatch transfering large files to NAS

Question

I'm having a problem transfering large (50-200GB) VHD files from a Windows Server 2008R2 machine to a QNAP NAS. After the transfer is completed, I verify the data using a CRC32 checksum to ensure it wasn't corrupted (these are important backups). I am getting constant CRC miss-matches for large files. I have tried using SAMBA and FTP, both with the same problem. I have tried several different tools to calculate the CRC checksum to rule out false negatives.

This is what I've tried so far:

Different Windows server (no fix).
Different NIC/Cables/Switches (no fix).
Different Updating NAS firmware (no fix).
Different destination server (I built a samba server in ubuntu which worked fine).
The same process works in another office with an identical model NAS using SMB.

IMO, this has been narrowed down to the QNAP device. I've ruled out the windows server, networking gear, samba protocol/version. Is there anything else I could try?

The QNAP share is configured as a RAID5 with the EXT3 filesystem.

So now that the CRC shows a difference, go find out what the difference is -- do a byte-by-byte comparision of the files to help identify the source of the problem. — womble, Sep 05 '11 at 03:32
Yes, I shut down the VM, 'exported' it to another directory and that exported copy is what I'm working with. I have not done a byte-by-byte comparison of the 200GB files as I don't see what that could tell me apart from which bytes were accidentally corrupted during transmission. — PSA4444, Sep 05 '11 at 04:34
A byte-by-byte comparison could show you one of two things. 1) your checksumming plan is incorrect (files are identical but have different 'checksums' for some reason) 2) Additional information on the corruption. E.g. you find that only the last byte is corrupted. Or you find that every occurrence of the letter 'a' has become 'A'. Whatever the exact difference is, it may point directly at the source of the problem. — Slartibartfast, Sep 08 '11 at 05:11
We're seeing a similar problem, which comes and goes on an irregular basis. I confess though that I don't understand how the byte-by-byte comparison could help pinpoint the source. Can someone give me some examples? — mahnsc, Sep 08 '11 at 15:01
We seem to have figured out the problem. One of the HDD's in the Raid5 seems to have failed. Yet another WD Green. I took out Drive1 and replaced it with a new one. While building the array, we had a failure due to drive3. QNAP should probably have a better way of checking whether or not the drives are working properly. The S.M.A.R.T data indicates that this particular drive is fine. 3rd WD Green to fail in the space of 9 months. @mahnsc: Are you also having a problem with CRC checks failing? — PSA4444, Sep 13 '11 at 03:24
Ours is looking more like a Sun tcp stack reassembly problem on the target FTP servers. — mahnsc, Sep 13 '11 at 10:48

score 2 · Accepted Answer · answered Mar 05 '12 at 05:31

Problem Solved.

It turns out to have been caused by a Faulty Western Digital SATA Hard Drive. We ended up replacing it and the problem went away (even though it passed all the drive tests).

WD drives have been nothing but trouble here. Thanks for the responses.

score 0 · Answer 2 · answered Sep 05 '11 at 04:38

0

Maybe a Jumbo Frame issue? Here's some good info.

answered Sep 05 '11 at 04:38

Cooter

214
1
3

I believe that SAMBA and FTP are both TCP based protocols. TCP connections should either succeed without corruption, or fail (also implicitly without corruption) due to jumbo frame issues. – Slartibartfast Sep 08 '11 at 05:08
It turns out to have been caused by a Faulty Western Digital SATA Hard Drive. We ended up replacing it and the problem went away (even though it passed all the drive tests). – PSA4444 Mar 05 '12 at 05:30

CRC Checksum missmatch transfering large files to NAS

2 Answers2