How to fix MEMORY_MANAGEMENT and ATTEMPTED_WRITE_TO_READONLY_MEMORY BSOD on Windows 10

4

0

So after 3 years of using a fresh install of Windows 7 x64, I've to upgrade to Windows 10 because of job reasons.

I downloaded the latest Windows 10 Pro build image (1803) from Microsoft and used Rufus 3.1 to create a bootable flash disk.

I did a fresh install on my SSD (formatting it previously) and after a few days of using the computer I started to get random BSODs. Around 1 or 2 per day.

The BSODs error are either MEMORY_MANAGEMENT or ATTEMPTED_WRITE_TO_READONLY_MEMORY.

Things I've tried:

  • Run 2 passes of Windows Memory Diagnostic. No errors.
  • Run sfc /scannow. No errors.
  • Updated the GPU drivers to the latest (Nvidia Geforce GTX 970)
  • Updated BIOS to latest version (from 0802 to 0803 on ASUS Z-97E)

I might have had just a couple of BSODs over the 3 years of Windows 7 so I don't think it's a hardware related problem. The BSODs started just after fresh installing Windows 10.

How do I open the MEMORY.DMP generated by Windows and what do I look in there to see what's causing the BSODs?

Anything else I can try?

UPDATE I've opened C:\MEMORY.DMP with WinDbg x64 and here's the result: https://pastebin.com/B2pS9VZt

UPDATE 2 I've just had another BSOD. This time it was SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION.
Dump here: https://pastebin.com/0hckXpqP

UPDATE 3 Minidumps files

UPDATE 4
I've run memtest all night and I've a lot of errors. I guess I do have faulty RAM. Are we sure this is faulty RAM right? It'll get fixed once I replace it right? No way to know which stick/s are faulty? Have to remove them and run memtest again to know which one?
Here are the results: http://ancient-name.surge.sh/

emzero

Posted 2018-09-01T22:02:44.840

Reputation: 573

Spurious random crashes with no apparent reason smells a lot like there might be a hardware issue or a malware. You can open the dump file with BlueScreenView and similar tools. As a kernel mode dev you'd use WinDbg straight, though. If the alleged cause (some driver or the kernel itself pointed out in the dump) varies across crashes, you're likely dealing with either of the above causes. Oh and some anti-malware that isn't yet prepared for the anti-Spectre/Meltdown kernel changes might also cause this, I reckon.

– 0xC0000022L – 2018-09-01T22:54:41.443

@0xC0000022L I've opened the memory dump with WinDbg and added the results in a pastebin. Can you see anything useful there? – emzero – 2018-09-01T23:27:42.020

I would first try running Verifier and see what you get in the dump then. I.e. Run verifier.exe, click Next (standard settings are fine), Select Driver Names from a list, Choose all non Microsoft drivers, click finish and reboot. Next time you get a bugcheck, can you provide the results of !analyze -v again? – HelpingHand – 2018-09-02T08:08:26.177

@HelpingHand I'm relatively certain that the majority of drivers will have been certified (not just attestation-signed), so they went through the HLK-tests and so on, most of which are extending on what Verifier provides. I am not sure, but likely Verifier is enabled on the slaves. But it's still worth a shot, I guess. Since at this point a bunch of possibilities exist. I had something similar five years back which could be traced to my graphics card. Without my background in KM development I guess I would have had to trust the conclusion of the HP team inspecting the laptop: "all is fine". – 0xC0000022L – 2018-09-02T20:41:34.923

@HelpingHand I've run verifier.exe, selected all non-Microsoft drivers and restarted the computer. I'm not sure what you mean by bugcheck. You mean a BSOD? Is it really that hard to know which driver/device caused a BSOD? =/ – emzero – 2018-09-03T14:51:33.670

@HelpingHand I've just had another MEMORY_MANAGEMENT BSOD. Here's the dump opened with WinDbg with the !analyze -v option: https://pastebin.com/jjNDKgCQ

– emzero – 2018-09-03T19:53:42.830

@Moab As I mentioned, it was a clean install using the latest image downloaded from microsoft. – emzero – 2018-09-03T20:32:47.730

Can you check that all the cables are tight (have you moved/had to open the computer up since the upgrade for any reason?), also make sure that the RAM is set at the right speed in the BIOS. Probably also worth running https://www.memtest86.com/ even though you have run the MS one. I'm afraid that's all I have but worth adding the results to the question.

– HelpingHand – 2018-09-03T22:31:11.087

I haven't checked any hardware related. Haven't moved or opened the case since the upgrade. RAM is set at stock speed (I have 4 sticks, 2 are 1600 and the other 2 are 1333. They have been running at 1333 for years with no issue). I'll replug everything tonight and also leave memtest running. – emzero – 2018-09-03T22:55:25.547

1Posting text output from windbg is almost always nearly useless. Please zip up your minidump files, post the zip file on a file sharing site like dropbox, and post a link. After looking at those we may ask you to share one of the larger dump files. p.s. - " Is it really that hard to know which driver/device caused a BSOD? " Very often, yes, it is that hard. It's not unusual to spend more than a week analyzing a single dump. But minidumps are often dead ends - not enough memory is preserved in them. – Jamie Hanrahan – 2018-09-03T23:03:32.253

@JamieHanrahan I've added a link to the 3 minidumps I have. – emzero – 2018-09-03T23:17:01.990

1Are you using Asus's automatic overclock? I had a very similar experience going from Win7 to Win10 on an Asus M5A97 r2 motherboard. In Win 7 is ran perfectly, but with no changes in BIOS after installing Windows 10 I would get random BSOD indicating memory issues, at the advice of some forums (that I can't find now) I disabled the automatic overclocking on the Asus BIOS main setup page and the problem went away. – acejavelin – 2018-09-03T23:29:55.247

1@acejavelin I'm using the default "normal" profile on Asus UEFI. I don't think it's overclocking anything. – emzero – 2018-09-04T00:30:51.690

4

Possible duplicate of Determining Bad RAM With Memtest86

– harrymc – 2018-09-12T20:33:17.300

Answers

10

The most likely cause of this type of crash is defective memory. As suggested by harrymc, the first thing to try is generally to run a memory testing program, such as Windows Memory Diagnostic (included in Windows), the original MemTest86 (maintained by PassMark Software), or the open-source Memtest86+. (I've added this section for the benefit of other readers here who may be experiencing similar problems but not have tried memory testing.)


The question author can skip this section. It is being retained for reference by other readers.

If the memory test passes, you may have a faulty processor.

The processor's integrated memory controller (IMC) can sometimes cause memory problems. Simple memory operations like reading data from particular memory locations may work normally, but the processor's ability to perform essential memory management operations, including virtual memory, aren't tested by memory testing programs.

Another possibility is a faulty cache. Caches are small amounts of memory inside the processor used to accelerate memory accesses. Although your processor should be able to detect cache errors (and generate a machine-check exception when that happens, causing a WHEA_UNCORRECTABLE_ERROR BSOD), it is not impossible for data in cache to get corrupted and cause memory corruption without the processor itself noticing. That, too, would not be detected by memory testing software.

To check the IMC, download Intel's processor diagnostic program and run an IMC test. To check the processor caches, download Prime95 and run the small FFTs torture test (your processor may get very hot or the fans may run loud; this is normal). If either test fails, you'll probably need to replace the processor. (I'm assuming the processor and memory are not overclocked or otherwise being operated outside of specifications.)


Since you've run a memory test and have found errors, it should be pretty obvious that one or more memory modules need to be replaced. I can glean more information from the report you've posted.

The errors occur at one particular region of the memory, around 0x19BDD79F0, which would limit the problem to one module. The address suggests, but does not confirm, that the problem is in one of the Patriot Memory modules.

Because memory on most systems, including yours, works best in pairs, try removing both modules of either brand and rerun the test. If that doesn't work, reinstall the modules you removed and remove the other pair. If the problem clears up, you can use the system normally until you get replacement memory modules.


I should note that Windows 10 uses more advanced memory management techniques, including virtual memory compression to maximize performance on systems with limited memory. Although your system has 16 GB of memory, Windows will still compress the data in memory by default (my desktop has 32 GB and is no different here). Memory errors are detected readily during compression and decompression of data and will immediately cause the operation to fail, causing the system to crash. Memory errors affecting uncompressed application or other data would "simply" result in application crashes or corrupted files (though it can still cause OS crashes). For this reason, Windows 10 is more sensitive to memory errors than previous versions of Windows.

bwDraco

Posted 2018-09-01T22:02:44.840

Reputation: 41 701

Thanks. I'll run those two and get back to you tomorrow. – emzero – 2018-09-04T00:32:22.700

Well, see my latest update. I guess I do have faulty RAM. But why it worked fine with Win7? – emzero – 2018-09-04T11:56:43.343

1@emzero: Updated my answer. Long story short, Windows 10 has a memory compression feature which makes it more sensitive to memory errors than previous versions of Windows. – bwDraco – 2018-09-04T16:21:24.513

1Ignoring kernel-/user-mode for a minute (even if compression were entirely user-mode) a failure to decompress would still trigger a crash (there is no safe way for the OS to recover, apart from maybe killing the process - but there's known faulty memory now). – Bob – 2018-09-04T17:27:58.230

Memory compression has no relation to crashes inside the kernel, which is what we are seeing here, as the compression is done in user mode according to the documentation you yourself supplied. – harrymc – 2018-09-04T17:40:42.840

1@bwDraco I've removed both Patriot modules and Memtest86 passed twice with 0 errors. Also, no BSOD in 12 hours. So I guess we can say for sure at least one of them is faulty. I'll just dump both and get 2 of the same Corsair model I already have. Thank you, the bounty is yours =) – emzero – 2018-09-04T18:10:04.313

1@emzero: To award the bounty, click on the blue "+500" below the voting controls for the answer you want to award the bounty for. I'm pleased I can help :) – bwDraco – 2018-09-04T18:16:52.387

@bwDraco Unfortunately, I've just had another BSOD, this time code was SYSTEM_THREAD_EXCEPTION_NOT_HANDLED. I cannot update my question because it has been locked. But here's what BlueScreenView shows: https://i.imgur.com/tpAXJxi.png

– emzero – 2018-09-05T16:40:31.573

@bwDraco And here's the WinDbg output: https://pastebin.com/462wTxLS .- This BSOD happens a few minutes after I installed the driver/software for my Logitech G203 mouse. Could it be related to that?

– emzero – 2018-09-05T16:42:06.850

@emzero: That's probably unrelated to the memory issues. It's in hidclass.sys, which handles USB input devices like keyboards and mice, and given that it happened after installing a mouse driver, it's probably safe to say that it has nothing to do with the memory. I'd still suggest running another memory test to be sure, but I highly doubt that's the problem at this point. – bwDraco – 2018-09-05T16:44:52.363

Ok, definitely something related to that because I tried to open the Logitech software and had that BSOD again. Stupid Logitech. – emzero – 2018-09-05T16:45:04.810

@emzero Reset all your cmos/bios/uefi settings to the factory defaults if you can and have not already too. – Pimp Juice IT – 2018-09-05T17:59:35.720

3

By the dumps the error is coming from two sources: IRQ and int 3 instruction. IRQ means that some driver that was called by the interrupt and it is faulty - what means that you have to check all drivers you have installed in this system for one to be faulty.

Int 3 is a debugger interrupt, which means that there is some software (also it can be driver) that is calling breakpoint (int 3) where it shouldn't be. This can happen for Debug version of software.

Mostly such BSODs are coming from not properly written drivers, so this is the source where I would search for a problem. Uninstall all drivers (or make a fresh install of operating system) and check them one by one. Between each driver make some bigger usage of the system (make 7-zip benchmark for example) and you will find the faulty one.

The other source of the problem could be overclocked CPU or RAM, to check that, configure system in BIOS to use only nominal values of provided hardware. Not more.

pbies

Posted 2018-09-01T22:02:44.840

Reputation: 1 633

1

I can see in the minidumps that you also had a crash condition of SPECIAL_POOL_DETECTED_MEMORY_CORRUPTION. I also noted that all your crashes occur in the kernel or inside the HAL, but never inside any device driver, so that the problem is not with a specific malfunctioning device.

It is therefore very likely that your memory is defective. Windows 10 might be using a defective part of the RAM that was not used before.

Have first a look at the Event Viewer to see if it contains any useful information.

Then I suggest to run MemTest86 :

MemTest86 is the original, free, stand alone memory testing software for x86 computers. MemTest86 boots from a USB flash drive or CD and tests the RAM in your computer for faults using a series of comprehensive algorithms and test patterns.

You might also try to boot with a subset of your RAM sticks, respecting the arrangements specified for your motherboard, in order to find the bad stick.


Notes on your MemTest86 results : You have thousands of errors. The tests of memory addressing passed, so the problem is not with the memory controller. The errors are with the RAM itself, where data stored is incorrect when read. This suggests that at least one of your memory sticks is bad, and that the problem is not with the CPU or the motherboard.

What you can do

You could take out sticks and run MemTest86 on the subset. Your motherboard is the Asus Z97-E that needs at least two sticks in DIMM_A1 and DIMM_B1. The following diagram is from page 1-7 of the manual:

image

Putting in any two sticks of same make, and testing, will narrow the field to either the Corsair or Patriot sticks. When you know the two sticks out of which one only (hopefully) is bad, you may try to mix sticks of different manufacturer. Their specifications seem identical, so this might work.

It might also be that putting only one stick in DIMM_A1 will work enough for MemTest86. The diagram from the manual is unclear and may indicate that one stick may work in either DIMM_A1 or DIMM_B1. Even if that's not the case, sometimes non-recommended configurations may still somewhat work, depending on the motherboard.

harrymc

Posted 2018-09-01T22:02:44.840

Reputation: 306 093