2
I have an Ubuntu 10.04.1 LTS linux server which is experiencing some weird issues... I just tried to download a 440 MB tgz
archive over HTTP using wget, and when expanding it with tar -xzf filename.tgz
I received:
gzip: stdin: invalid compressed data--crc error
Finding this odd I renamed the file filename-bad.tgz
and downloaded it again. I received the same error on the second download... The site listed an md5 checksum for the file so I checksummed both the two download attempts to see if maybe this file was just corrupt...
The two files had different checksums!
So I downloaded this file to my local workstation and ran md5sum
on it there. This time, the MD5 checksum was correct, and the file extracted properly. So I copied the file from my workstation to the server and ran md5sum
on that copy. It was a new md5sum, different from the correct md5sum and different from the two other attempts!
Here is the detail of the server:
- Intel(R) Core(TM) i5 CPU (Dual Core)
- 8GB RAM
- Software RAID5 array using linux md devices and 3 1TB SATA drives
- 2 ethernet cards, connected to two different networks in our office (the wired and the wireless network)
I suspected maybe the RAID array was degraded/malfunctioning, so I ran mdadm --detail
and it reported the state was clean
and all drives were in active sync
. To further test, I copied a 1GB file from an SD card to the RAID array, and the md5sum of that file verified.
What could be going on?
EDIT: Output of cmp -l
as requested:
324268145 115 105
324268657 274 264
324269297 332 322
324270577 345 344
324270833 155 154
EDIT2: I just realized one of the copies I have actually does have the correct MD5 checksum, so I copied the file from my local machine two more times and both times the checksum was correct! So a few more tests are in order here...
EDIT3: I am now unable to reproduce this issue. Which sounds like bad RAM to me. Will run memtest tonight, any other ideas welcomed!
EDIT4: Ok. Now this is weird. The issue is 100% reproducible when copying the file to specific VMWare virtual machine is running on the server. If I copy the file to that virtual machine, sometimes if I immediately copy the file to the host, the problem is reproducible. scp
also sometimes says this when copying to the virtual machine:
Received disconnect from 10.1.0.73: 2: Packet corrupt
These all seem to me to be clues of bad RAM. Does everyone concur? Any other possible explanations?
EDIT5: Solved. Gee, what on earth could have been causing this problem? I just don't understand.... :-)
(I did test the RAM on this system right after I bought it, which was two-three months ago... oh well. Looks like it's time to call Dell...)
I posted this on SuperUser as opposed to ServerFault because it's consumer-grade hardware, and it's a small office server as opposed to a serious production server. And it uses software RAID. But maybe SF is a better place for it, not sure! – Josh – 2010-08-02T21:25:01.200
1Whatever's going on, it seems to have something to do with the server, since the file seems to be getting corrupted whenever you transfer it to the server from either your desktop or the download source. – David Z – 2010-08-02T21:58:50.110
2My first guess would be the ram. Did you memtest? Could you change ram for testing? – matthias krull – 2010-08-02T22:12:36.867