Gradual Disk Errors

3

2

I am having an intermittent problem with my computer where programs start freezing up. What generally happens is that certain programs stop responding entirely, and are impossible to kill via the task manager (almost always iTunes and my backup program). Eventually, I'm forced to reboot. Inevitably, when I reboot, my RAID 1 array goes into a verification scan, finding and repairing errors along the way.

Because the programs that lock up are those that read large parts of the disk and the RAID repair after reboot, I'm inclined to think that one of the drives in the array has errors that develop slowly over time.

Any ideas as to how I might diagnose which drive and whether I need to just replace the drive entirely? Could it be the RAID card instead? Has anyone seen similar problems with a RAID array and iTunes locking up?

EDIT: The raid controller is an Intel ICH8R/ICH9R/ICH10R/DO SATA RAID Controller. I don't think that's the product name, but it's all the info I can glean from the device manager.

Update Since I asked this question, I stopped using the RAID 1 array and upgraded to a new, single drive. I still see the same sort of degradation after a couple weeks of uptime, but now when I reboot, instead of rebuilding the array, the OS forces a check-disk, where it often finds a couple of errors, fixes them with no problem, and then continues booting.

Any help would be greatly appreciated.

tghw

Posted 2009-10-29T16:29:28.820

Reputation: 854

2Run a short SMART self-test in the BIOS configuration. – Hello71 – 2011-04-26T21:01:28.110

@Hello71 I'll give that a try when I get home tonight. – tghw – 2011-04-26T21:22:26.983

@tghw - When you are 'forced to reboot', how are you rebooting? Does the disk check happen when you do a normal, planned reboot? When things start locking up, is the CPU showing busy in Task Manager? Does your mouse pointer lock up? – Ƭᴇcʜιᴇ007 – 2011-04-26T23:56:47.740

@techie007 I reboot from the start menu, but it never finishes "Logging off" windows, and I have to hard reboot. CPU(s) are not pegged, mouse is active, it's just that a few applications get locked up, and working with the file system gets a lot slower. – tghw – 2011-04-27T12:20:31.433

@tghw - Do you have this shutdown issue during a normal, planned reboot as well? If so, does it reboot properly out of safe mode? – Ƭᴇcʜιᴇ007 – 2011-04-27T12:42:05.553

@techie007 Planned reboots, before it starts acting up, go fine. – tghw – 2011-04-27T15:10:30.803

What RAID controller are you using? – Brian Knoblauch – 2009-10-29T19:43:41.190

Added to the question. – tghw – 2009-11-03T19:35:50.877

Answers

3

The errors you are experiencing are likely filesystem-level corruption and not physical issues with the drive. If there are CRC errors or other such things with the drives, you'll get disk errors in the Event Viewer.

One thing that can cause programs to be impossible to kill via the task manager are likely stuck somewhere in the kernel. Usually this means a device driver is at fault. I don't know what types of drivers iTunes installs but it could be a problem. Try updating your iTunes to the latest version if possible. I could imagine some types of software that monitor your disk for changes could cause an issue as well.

Also, try to see if you have the latest drivers for your chipset and try updating your BIOS to the most recent version.

EDIT: Windows also supports things like "filter" device drivers that intercept reads and writes going to physical device drivers. If there is an issue with a filter driver, i.e. it's stuck waiting for sommething else, then it might cause the system to freeze. Nero's PxHelper.sys (or something like that) is an example of such a driver commonly attached to the CD-ROM device driver.

Possible software that would do this for a hard drive would include antivirus software, encryption software, possibly some types of backup software, Windows AIK, and malware

LawrenceC

Posted 2009-10-29T16:29:28.820

Reputation: 63 487

Good call. I have the following in the event viewer: "The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume OS." (Event ID 55) Still not sure why it's happening, though. – tghw – 2011-04-27T12:24:47.887

If Windows is interrupted while it's updating things on the disk that it uses to look up files and find free space, like MFT records, then you might get things about chkdsk on reboot. See my edit above for more information as well. – LawrenceC – 2011-04-27T15:18:59.507

Taking into account 'couple weeks of uptime', I guess some I/O or filter driver might be failing and getting unloaded; maybe the DMA HDD access is reverted to one of the PIO modes - which could explain the very slow shutdown. I'm not sure Windows-7 was initially designed for 'couple weeks of uptime'. – chronos – 2011-04-30T23:20:06.043

1

I heard about someone having a similar issue before. It turned out that their motherboard only supported SATA version 1 and the hard drive was trying to run at SATA version 2 speeds. He ended up fixing the issue by down-throttling the hard drive to 150 MB/s using the pin OPT1 (works on Western Digital drives). If this was the case, you'd notice some abnormal-looking graphs using the HD Tune benchmark (such as reaching a peak and constantly dropping down to 0). The benchmark gives better results when the hard drive is not being used for anything else. The Average transfer rate should be around 100 MB/s for relatively new desktop computers.

The HD Tune benchmark should gradually decrease as it goes to 100%. If it is a straight line all the way across at 200 MB/s, you probably have one of those awesome new solid state drives. If it is straight all the way across at around 10 MB/s, your drive could be stuck in PIO mode and going super-slow (which can make large applications appear to hang). Windows knows when it has not been shut down properly. A forced shutdown could be causing the chkdsk on start-up. I would imagine that forcing a shutdown in the middle of a write operation could cause file system errors.

Screenshots of a completed benchmark, the info tab, and the health tab from HD Tune might help narrow down the issue (you can use the free version).

James T

Posted 2009-10-29T16:29:28.820

Reputation: 8 515

1

If your RAID controller is one of those lower-cost models that doesn't do its own processing but relies on the driver to do all the hard work (many lower-end motherboards with integrated RAID controllers have these types of controllers), and Windows is crashing while the disk is being updated (or all the updates haven't been written to the disks prior to the reboot operation), then this could be the cause of your problem.

One big clue here is that your computer is slowing down a lot, especially when dealing with large amounts of data. Does your RAID array seem to start rebuilding before the OS boots, or after? If after, then it's very likely that you have one of those software RAID controllers I mentioned [in the first paragraph].

Randolf Richardson

Posted 2009-10-29T16:29:28.820

Reputation: 14 002

0

I had a client that had similar problems. We all thought it was the raid card so replaced it with another. 2 more cards later and we came to the conclusion it was the computer. When he replaced the computer and used one of the "what we thought was" faulty raid card and the problem never resurfaced.

In the end, we never really narrowed down exactly where the problem was. It could have been ram, motherboard or CPU, who knows. But since he replaced the whole box and was able to re-use the raid controller & hard drive we isolated enough to know it wasn't software, or hard disk or raid controller.

If you have multiple ram sticks, how about removing some of them and move them around and see if you get the same problem. e.g. (4GB RAM) 2x 2GB sticks. Remove the top one. Run the machine for a while on 2GB and see if the problem is still there. If it is, swap the ram stick with the other one and leave one out. Same problem? might not be the ram, problem goes away? hmmm interesting... could be a ram issue.

Matt H

Posted 2009-10-29T16:29:28.820

Reputation: 3 823

I think I can rule out the RAM; I just verified that I was having this problem before I replaced every stick in the box, and the problem continued. It's possible the new RAM has similar issues, but chances are it's the motherboard or RAID card. – tghw – 2011-04-26T23:47:21.367