Win7 x64 unresponsive for a minute or so. HD failing?

On a fully updated Win7 x64, every so often the system stalls for a minute or so. This has been going on for a couple months now. By stalling I mean the mouse responds and I can move windows around, but any window, any program, that is open becomes whiteish when I select it AND any new programs will not open. It doesn't matter what kind of program it is. When the stall stops all clicks I made (open new programs for example) take effect.

Nothing shows up consistently (as in every time this happens) in the event log. Today though I was able to find something, but it doesn't reveal much other than the "system was unresponsive". It's a 7009 for "A timeout was reached (30000 milliseconds) while waiting for the Windows Error Reporting Service service to connect."

It doesn't matter if I have any USB devices plug-in or not. I've ran Microsoft Security Essentials and Malwarebytes.

While the machine is unresponsive, I've noticed that Drive D (the other partition on the single internal HD in this laptop) is displayed like this in explorer. This never occurs with Drive C or any other drive on the machine. how drive D shows up in explorer in explorer .

SMART report for the physical drive:

Read benchmark by HD Tune 5 Pro, probably the most telling piece of the puzzle. Isn't this alone enough to see there is a problem with the drive, regardless of whether the unresponsiveness is caused by such purported problem?

Here is a short hardware report:

Computer:      LENOVO ThinkPad T520
CPU:           Intel Core i5-2520M (Sandy Bridge-MB SV, J1)
               2500 MHz (25.00x100.0) @ 797 MHz (8.00x99.7)
Motherboard:   LENOVO 423946U
Chipset:       Intel QM67 (Cougar Point) [B3]
Memory:        8192 MBytes @ 664 MHz, 9.0-9-9-24
               - 4096 MB PC10600 DDR3 SDRAM - Samsung M471B5273CH0-CH9
               - 4096 MB PC10600 DDR3 SDRAM - Patriot Memory (PDP Systems) PSD34G13332S
Graphics:      Intel Sandy Bridge-MB GT2+ - Integrated Graphics Controller [D2/J1/Q0] [Lenovo]
               Intel HD Graphics 3000 (Sandy Bridge GT2+), 3937912 KB 
Drive:         ST320LT007, 312.6 GB, Serial ATA 3Gb/s
Sound:         Intel Cougar Point PCH - High Definition Audio Controller [B2]
Network:       Intel 82579LM (Lewisville) Gigabit Ethernet Controller
Network:       Intel Centrino Advanced-N 6205 AGN 2x2 HMC
OS:            Microsoft Windows 7 Professional (x64) Build 7601

The drive less than 1 year old. Do I have a defective drive? Seagate Tools diag says there is nothing wrong with the drive...

UPDATE: I noticed that the windows error reporting service entered the running state then the stopped state and the space between the two events was exactly 2 minutes. Which error it was trying to report I don't know. I check the "Reliability Monitor" and it shows no errors to be reported. I've disabled the windows error reporting service to see if the problem stops.

Gaia

Posted 2012-10-31T02:09:39.657

Reputation: 4 549

I doubt it is the HDD. I've never seen, or heard of, anything like this, but I seriously doubt your HDD is defective. However, it is possible that it is on it's way out, especially since it's a hard drive, and it's a year old. Whether or not it is the cause of this problem, is unlikely. Again, your HDD should be fine – Sylvester the Cat – 2012-10-31T02:20:04.347

That read benchmark graph does NOT look normal, @Sylvester – Gaia – 2012-10-31T02:24:56.113

That graph wouldn't load, so I had to base the comment on what I read, which is why I didn't bother posting it as answer. When I get home (home internet has no filters), I will look at graph and repost EDIT: I'd have thought it was just something (i.e. application) causing it to hang. I had a hard time imagining that you would be able to do all that based only on the code currently loaded in RAM without loading more – Sylvester the Cat – 2012-10-31T02:30:36.310

SMART data indicates it is reallocating sectors, a sign of a failing drive. – Moab – 2012-10-31T02:55:30.480

1Its pretty clear based on the S.M.A.R.T data the hdd is reallocating sectors, which is what causing your problem, I would replace the hdd. I had a similar problem although the drive was much older, I had similar performance problems, resulted in near total failure and unreadable sectors ( which cause the hdd to shutdown if read ). – Ramhound – 2012-11-05T17:45:22.237

@SylvestertheCat you are right. It is not the HDD - I replaced it, and the problem persists. Besides, C: was never affected, only D:, and they both reside on the same physical drive. – Gaia – 2013-08-22T16:58:32.210

@Moab It is not the HDD - I replaced it, and the problem persists. Besides, C: was never affected, only D:, and they both reside on the same physical drive. – Gaia – 2013-08-22T16:59:11.997

@Ramhound It is not the HDD - I replaced it, and the problem persists. Besides, C: was never affected, only D:, and they both reside on the same physical drive. – Gaia – 2013-08-22T16:59:27.327

Answers

Based on the new information you have provided, I can say that there is in fact no problem at all. Then why does it “go offline” for a few seconds for up to three minutes after suspending the guest OS? Because as you said, the HDD LED light stays lit while the drive remains unresponsive because it is being heavily used.

What is happening is that when you finish using VMWare and want to sleep the guest OS, you use the standby or hibernation feature instead of shutting down. This causes VMWare to copy the contents of the VM’s RAM to disk so that it can resume where it left off without having to boot up all over again. Depending on how much memory you have assigned to the VM and how much was being used, this can mean that VMWare has to write quite a lot of data (gigabytes) to disk.

When VMWare copies the memory to disk, the drive becomes more or less unresponsive to new disk operations until the current disk operations (writing the RAM to a file) have finished. As a result, when you open My Computer, Windows tries to refresh the data but it cannot read the drive to fetch the needed data because there’s all those write commands already in line waiting to happen. Therefore it leaves it empty and looking like it’s offline until it can manage to slip in those read requests (between VMWare’s write operations).

If you open the drive in Explorer, you will see that either it will not open it at all for a while, or it will open it and flash the address bar with a green progress bar like it does whenever there is a lengthy file operation (like searching for thousands of files).

In summary, there is nothing surprising or mysterious about this situation. If instead of putting a VMWare guest OS into standby, you had just manually copied a giant file to the drive, the results would be exactly the same.

So what can you do to fix it? Aside from changing to a faster drive (or using an internal one if D: is external), your best bet is to defragment the drive. If D: is very fragmented, then when VMWare tries to flush the RAM to disk, it will cause it to thrash around a lot while writing chunks of the giant file to different areas (of course this is assuming it’s not an SSD, which if D: is still a partition on the same 0ST320LT007 drive as C:, then it’s not).

If you defragment the drive (assuming that there is sufficient free space), then the system can write the RAM file with only a few file operations in large swaths (e.g., write 1GB of data at cluster X) instead of many, many little operations (write 1MB here, write 245.18MB there, 4KB here, another 18.1MB somewhere else…) Then sleeping the VM will finish much faster and the drive will be more responsive.

To find out exactly what the access is that is causing the drive to be active and busy, you can use a tool like Process Monitor. Run it and click the class-filters to select only the file-class filter as seen below.

Now you can see what files and folders are being accessed. Make sure to memorize the hotkey to start and stop activity capturing (Ctrl+E) so that you can stop it once it starts flooding with what is likely to be the disk operations from VMWare.

Screenshot of Proccess Monitor with only file class filter active

Synetech

Posted 2012-10-31T02:09:39.657

Reputation: 63 242

D: is a SSHD hybrid, new, fast and internal. It is fully defragged because I copied the contents from the old drive to the new one in a way that it doesn't clone the fragments. Also, it happens for shorter periods of time regularly, regardless of VMware usage (which BTW is a 2gb virtual HD with 1GB RAM).

I have noticed that the hiccup is a short one (non vmware related) and my external is plugged in, D only comes back online after the external drive makes a noise (sounds like the head is parking)

I really wish there was something I could run to log what is going on when these failures happen – Gaia – 2013-08-28T07:22:45.777

Like I said, if the HDD LED is on, then the drive is active. If it is being heavily used, then it will be unresponsive, that’s just how it works. I’ll add some info on figuring out what the access is. – Synetech – 2013-08-29T14:47:40.457

Did you find out what the file accesses were? – Synetech – 2013-08-29T19:47:41.510

The described symptoms are indeed endemic of a bad drive. When a disk is unresponsive, the system waits for a seemingly immeasurable amount of time before timing out and throwing an error.

That said, it is curious that it only seems to happen to the D: volume (which you implied was a partition on the same physical drive as C:). If it were a software issue (e.g., corrupt file-system on D:), then it should not be happening intermittently, while a hardware issue could indeed happen intermittently if for example there are only a couple of bad sectors towards the inside of the platter and the system only occasionally happens to touch them. Of course you already said that HD Tune reported none. However, as you thought, modern drives do indeed hide bad sectors. They usually have a bunch of spare sectors that they can remap bad sectors to and yes, they do this transparently so that the OS does not know about them (other than generic information via SMART).

If the Data column is reporting raw data, then yes, 2,465 relocated sectors is a lot. If it only happens with D:, then the bad sectors are likely grouped towards the center of the platter where the head goes to park, so maybe the drive got jostled while the drive was shutting down/spinning up.

What is that volume being used for? If it is being used for things like storing the temp directory and such where the OS or programs make occasional access to it, then it could be a corrupt file-system (of course you said you ran chkdsk, so it should not be).

You can check/confirm if it is a physical problem with your drive by opening the Event Viewer (eventvwr.exe) and checking the System log for events with a Source of Disk. You can cross-reference the indicated disk number in the Disk Management MMC snap-in (diskmgmt.msc).

Bad Disk event in Event Viewer

Corresponding disk number in Disk Management snap-in

Synetech

Posted 2012-10-31T02:09:39.657

Reputation: 63 242

(Yes, that’s XP’s Event Viewer, but Windows 7 shows the same info for drive-access errors; it’s just more annoying to access because the 7 Event Viewer is slower and more cluttered.) – Synetech – 2012-10-31T04:54:37.783

Yes, D: is a partition of the same physical drive as C:. I have programs in C and data in D. Filtering the evento log does bring up nearly 4000 events for disk, but they are all "The driver detected a controller error on \Device\Harddisk1\DRXX" or "An error was detected on device \Device\Harddisk1\DRXX during a paging operation.". None of them include the string Harddisk0, which would be the ID for the primary drive. Woudn't the drive firmware make the reallocated sectors invisible to the OS anyways? – Gaia – 2012-10-31T11:22:22.267

CHKDSK found only some unused index entries and unused security descriptors to clean up. No bad sectors – Gaia – 2012-10-31T11:26:04.420

If it is mentioning the controller, then what could be happening is that the cable is loose (common with SATA cables). Of course it doesn’t explain why it only happens with D:, but it is a simple check and fix to rule it out. Do you have any external/flash/USB drives or memory cards connected? I find I get problems like that whenever I have a flash drive connected to a specific USB port through an obviously flaky extension cable. – Synetech – 2012-10-31T17:32:46.443

I will check that. As for the other point, the problem occurs even when no USB drives are connected, so that wouldnt explain why drive D comes offline... – Gaia – 2012-10-31T19:35:30.483

Update: I replaced the drive, the problem still occurs. – Gaia – 2013-08-22T17:09:50.330

1If it’s still doing it with a new drive, it’s likely a problem with the controller or cable. I’ve had flaky drive behavior like this due to the cable being incompletely inserted into the drive/motherboard connectors. (I’ve never had this happen with IDE, only SATA because the design is incredibly poor.) Make sure the cables are fully inserted at both ends and are not being pushed on by something else. If it still doesn’t work, then the motherboard’s SATA controller is probably faulty, but try each connector (when IDE channel 2 of an old motherboard, I was able to keep using it with channel 1). – Synetech – 2013-08-22T19:01:52.897

A cable problem when the other partition on the same drive never has a problem? – Gaia – 2013-08-23T00:12:40.047

Did you check the SMART data of the new drive? – Synetech – 2013-08-23T22:02:53.383

Yes, @Synetech. All OK. – Gaia – 2013-08-24T02:20:57.987

The problem has been traced down to VMWare Player. It happens immediately after on some time after VMWare guest OS is shut down. More info here.

The solution in my case was disabling the VMware Authorization Service. This service is only needed when the virtual machine needs to be run by non administrators.

Update: Disabling the VMware auth Service AND re-enabling the Application Experience Service (which I had disabled because I deemed it unecessary) solved the problem.

The D: drive still goes "offline" for a few seconds, even after I have replaced the HD. This doesn't render the entire machine unresponsive, only specific applications that depend on data stored on D: (like outlook, in my config). I'm going to consider the D: offline drive issue as a separate issue.

Gaia

Posted 2012-10-31T02:09:39.657

Reputation: 4 549

Ah, so you had disabled the Application Experience service? You should have said that in the first place! (I’m kidding; you had no way of knowing that it was related.) But seriously, disabling that service causes Windows to deny access to some files for some time. It seems to be related to Security Essentials in some way, presumably that SE depends on it and when it is disabled, it can’t scan an (executable) file, so it keeps the file locked until some lengthy timeout period elapses. That does fit your symptoms. For the record, you probably don’t need to disable the VMWare Auth service. – Synetech – 2013-08-24T14:57:44.663

@Synetech i confirmed the VMWare Auth service is unnecessary. i already have a lot of necessary stuff running. And yes, Application Experience did leave some EXEs locked for a period of time. – Gaia – 2013-08-24T18:33:14.910

Yes, it’s unnecessary when you’re not running VMWare, but you don’t have to disable it; you can set it to manual (it’s really annoying that you have to manually start it since VMWare won’t start it automatically ಠ_ಠ). Either way, what I meant was that it isn’t causing the problem, so simply running Application Experience would fix it even if VMWare Auth is still running. – Synetech – 2013-08-24T19:38:30.467

the stalling still occurs, though less frequent and only when i shut down vmware. could it be related to a very large file (2gb+) being accessed? – Gaia – 2013-08-25T00:40:32.287

Do you have an LED that shows disk activity? If so, look at it. I suspect that when you shut down VMWare, it does a bunch of clean up which includes a lot of disk activity (writing RAM to disk, etc.) In fact, this would be a lot worse after shutting down than while the guest OS is actually running. Does it still happen if you wait a long time after VMWare has shut down (e.g., go for lunch and come back)? – Synetech – 2013-08-25T03:15:24.963

@Synetech It lasts for up to 3 minutes after VMware machine is suspended (I rarely shut the guest OS down) – Gaia – 2013-08-25T16:40:49.613

Have you looked at and watched the HDD LED to monitor disk activity? – Synetech – 2013-08-25T17:48:15.323

Yes, it stays on pretty solid during the unresponsive period. – Gaia – 2013-08-27T16:49:22.393

Then there’s your answer; the disk is simply being heavily used. No surprise that it gets unresponsive. You can use a program like Process Monitor to find out what program is accessing what file(s), but it’s pretty clear that it’s VMWare saving the state of your VM. Shutting the guest OS down only flushes caches and stuff and takes no longer than shutting down a real OS does. Saving the state (standby/hibernation) however means that VMWare has to save the entire RAM to disk, and of course writing a GB or two takes time.

– Synetech – 2013-08-27T18:57:59.690

@Synetech But the D drive should be offline as seen from explorer. – Gaia – 2013-08-27T23:36:00.077

I don’t know what you are saying. In your question, you said it is displayed offline as though that’s a bad thing, which makes sense because if VMWare intensely writing to C: while shutting down, then it could be saturating the drive controller (Explorer takes a few moments to read my flash-drive and show its information when I have something else occupying the same bus). Your comment above says it should be offline (as though it is expected/desired), so I’m no longer clear on what the problem is. – Synetech – 2013-08-28T00:06:55.167

@Synetech Sorry. I meant "But should drive D be offline (as seen by explorer) just because the disk is being heavily used?". Also, the VMware virtual disk is on drive D. – Gaia – 2013-08-28T01:57:20.610

Ah, then it makes even more sense. The drive isn’t offline, it is just overloaded. VMWare is writing the entire contents of the VM’s RAM to disk (how much RAM have you assigned to it 512MB? 1GB? 2GB?) This is a lot of writing to do, particularly if the drive is fragmented. It takes a while for the drive to seek around and write it all. As a result, the drive becomes essentially unresponsive to new disk operations (like refreshing the My Computer icon) until the current queue of disk operations has been completed. No mysteries or surprises here. – Synetech – 2013-08-28T02:26:39.530

This is a hard problem to diagnose from the information you provided (which was a lot of info, don't get me wrong). One way to diagnose this as a hardware problem is to try to recreate the problem with an install of Linux, such as through wubi.

I have seen similar things happen when there are bad sectors on the HD. But I have also seens simliar problems due to faulty drivers.

Have you tried CHKDSK and scanned for bad sectors?

Mikhail

Posted 2012-10-31T02:09:39.657

Reputation: 3 782

HD tune reports no bad sectors. I haven't ran CHKDSK recently, but I have ran it since the problem started. – Gaia – 2012-10-31T02:34:45.473

I ran CHKDSK and it gave me no errors at all. I believe that the disk subsystem would hide errors from CHKDSK anyways, no? – Gaia – 2012-11-05T17:04:37.170

Not for bad sectors, there is no way around them because the hardware can't write or read and hence no hiding is possible. – Mikhail – 2012-11-06T06:02:50.890