2

Every 10 seconds or so, both our web servers (windows server 2003, running iis6), report the same errors in the event log.

> Event Type:   Error Event
> Source:   Application Popup Event
> Category: None Event ID:  333
> Date:     2009-08-18 Time:        22:04:06
> User:     N/A Computer:   DFS273
> Description: An I/O operation
> initiated by the Registry failed
> unrecoverably. The Registry could not
> read in, or write out, or flush, one
> of the files that contain the system's
> image of the Registry.
> 
> For more information, see Help and
> Support Center at
> http://go.microsoft.com/fwlink/events.asp.
> Data: 0000: 00 00 00 00 01 00 6c 00  
> ......l. 0008: 00 00 00 00 4d 01 00 c0
> ....M..À 0010: 00 00 00 00 4d 01 00 c0
> ....M..À 0018: 00 00 00 00 00 00 00 00
> ........ 0020: 00 00 00 00 00 00 00 00
> ........

I can't find any information as to what could cause these kinds of errors. The CPU is quite busy at 90-100% but there is almost 1 GB of unused RAM.

windyjonas
  • 143
  • 1
  • 1
  • 7

4 Answers4

4

Below is a real case which I encountered last week.

The symptom is the same: several "An I/O operation initiated by the Registry failed unrecoverably" events were logged in system event. Also, one application reported "create process failure" in the application event. Since the CreateProcess() function seldom fails, the appearance of this event is a good indication for system resource repletion.

In fact, I found a "Previous Shutdown was Unexpected" event which effectively means Windows failed to clean some time stamp when shutdown.(http://support.microsoft.com/kb/950323) The operating system didn't even have a chance to update a value in the registry! How could this happen? It's not hard to guess that Windows is leaking Non-Paged or Paged Pool memory.

So I added two counters: Non-Paged Pool Bytes and Paged Pool bytes as well as kernel objects counter in case of handle leak. Unsurprisingly, the system crashed 2 days later, as the following figure shows, Paged Pool size keeps increasing from 2009-10-24 09:28 till 2009-10-26 23:26 when the system crashes with a paged pool size of nearly 360MB. I use Procexp to obtain the limit of paged pool which is indeed 360MB.

The last step is to find out which driver is leaking, Poolmon (http://technet.microsoft.com/en-us/library/cc737099(WS.10).aspx) can be used to monitor detailed Paged Pool and Non-paged pool info.

alt text

alt text

yanglei
  • 168
  • 1
  • 9
1

Disk/controller/RAID hardware? Take the machine down and run chkdsk c: /v /f (and also on any other partitions you have.). I know you said the problem happened on two machines, but perhaps they both have disks from a bad batch.

Or your disk is fine but there's a one-time glitch that caused a registry corruption. The 10 second interval is probably the heartbeat function that Windows does periodically (and sometimes results in "the system shutdown at....was unexpected" message in the event logs after a crash.)

dmoisan
  • 447
  • 2
  • 6
0

We had exactly the same problem. Same EvenetID error in the Event Viewer (333). Every few days server(Windows Server 2003 x64) became unresponsive. It was impossible to log on to the machine locally or remotely so We had to restarted it every time. We upgraded the RAID/Disk/FibreChannel - Firmware and Drivers and uninstalled some app for online backup(IDrive or IStore or something like that) and the problem was gone. So I'm still not sure was firmware upgrade solve the problem or the faulty app was causing problems.

PsYx
  • 1
0

I added the column "handles count" to the processes view. One process permanently keeps creating handles (SNMP). Performance Wizard shows, SNMP had over 2 Million handles before our last server crash.

It's definitely a handle leak. The eventlog entry is merely the result of system resource depletion. The question is which process is leaking handles? I recommend using perfmon to trace various system wide resources counters, so when the system crashed again, you've enough data to find out the root cause.

The following counters maybe helpful: Object, Memory, Process\Snmp

BTW: in your case, the culprit is obviously the snmp process.

yanglei
  • 168
  • 1
  • 9