5
1
SOLUTION: It was the RAM settings all along :-| It never occurred to me that the stock settings on a stock board with stock RAM would be so far off that it'd cause system instability. I've never done any overclocking, so I never looked very closely at those settings. Once I chose the DOCP profile that matched my RAM, everything cleared up, and it's even a little faster. Thanks to Twisty Impersonator for the process guide and to magicandre1981 for the suggestion that prompted me to check the settings. Hopefully, this will save someone else 2 years of frustration.
EDIT: Well, I think the cause has become clear. After replacing ALL the hardware, and STILL seeing a problem, I decided to go back to the hardware idea. In short: if I run with two sticks of RAM, everything is fine. It doesn't matter which two sticks. If I put in all four, I start having problems. This seems like a pretty clear indication of a bad motherboard.
The Symptoms:
For the last several years my machine has been generally unstable, off and on. Typically manifests as BSODs with varying stop codes.
- Upgrading the RAM improved the stability for a while.
- Upgrading the motherboard improved the stability for a while.
- Replacing the
C:
drive improved the stability for a while. - Refreshing or reinstalling the OS has occasionally been necessary, and usually improves stability for a while.
I have replaced literally every functional component in the system, except the CPU and Blu-ray drive. I have not ruled out the CPU, but there is still a vast swath of software-"things" that might also be at fault.
Each time, the problem has returned after a few months.
Most recently, the symptoms have changed slightly. I am open to the possibility that this is a completely unrelated problem, but it seems too similar to the problems I have been battling the whole time, to be mere coincidence.
A few weeks I rebooted my computer to update, and it would not POST
. I fussed with it for a while (checking connections, MemOK!
button, disconnect power, TPU
on/off, EPU
on/off, etc.) and got it to POST
, but the OS would not load. I forget the exact presentation of symptoms, but IIRC it would just sit and spin.
Reinstalled the OS and things were quiet for a week or so, until apps began crashing. At first, it seemed like all the apps that were crashing were installed on the same SSD. Without room to move things around and test, I upgraded to the new Samsung drives. But apps are still crashing.
- Flashed latest BIOS update. No change.
- Turns out, you have to reset the CMOS when you flash the BIOS. Potential symptoms are much like mine. I reset the CMOS. No change.
- It was generally high-demand applications that would crash (Dishonored 2, Diablo III, ESO, etc). But crashes are happening between 35°C-45°C for CPU and GPU - So probably not temperature.
- It is not running out of RAM.
MemTest
has never shown any problems. I have run it dozens of times.- No CPU test has ever shown any issues, except at high temperatures.
- No GPU test has ever shown any issues, except at high temperatures.
- I've reinstalled my video drivers a few dozen times.
- I had Task Manger crash while I was watching yesterday.
- Tried to install a Windows Store App. Some background process crashed. Had to try again. Worked fine.
- Event Viewer has just
AppCrash
events
AppCrash
events are being produced by a wide range of applications. Varying sizes, locations, demands, etc. It is typically once a day, maybe less. But high-resource applications crash pretty reliably within 30 minutes or so.
I should clarify that these are not Windows is looking for a solution
AppHang events. The application just vanishes, like I closed it, and Windows has nothing to say about it except the AppCrash event in the Event Viewer. Less often, there is a BSOD. Lately, I have seen IRQ not less than or equal
, and others that I cannot remember... (I don't have any memory dumps anymore? That's weird...).
System specs:
- OS: Windows 10 Pro (upgraded from Win7 during free upgrade period)
- CPU: AMD Phenom II 1090 (no overclocking)
- Cooling: CoolerMaster 150mm CPU fans, several case fans
- Mainboard: ASUS M4A99X EVO R2.0
- RAM: G.Skill 16GB(4x4) DDR3-1333
- GPU: MSI GTX 970 (no overclocking)
- PSU: Corsair CX750M
- System drive: Samsung 850 EVO 500GB
- Other drives: Samsung 850 EVO 500GB, other conventional drives, optical drive
- A/V: Windows Defender, no other AV
Crash dump:
Prompted by this post: https://superuser.com/questions/1281659/possible-to-determine-which-core-a-faulting-application-was-on-when-it-crashed
Hit a new BSOD while it was idling last night. Details from WhoCrashed
below:
Crash dump directory: C:\WINDOWS\Minidump
Crash dumps are enabled on your computer.
On Wed 1/3/2018 9:00:13 AM GMT your computer crashed
crash dump file: C:\WINDOWS\Minidump\010318-12546-01.dmp
This was probably caused by the following module: ntoskrnl.exe (nt+0x1640E0)
Bugcheck code: 0x1E (0xFFFFFFFFC0000005, 0xFFFFF8019CED183E, 0xFFFF968442FBEB68, 0xFFFF968442FBE3B0)
Error: KMODE_EXCEPTION_NOT_HANDLED
file path: C:\WINDOWS\system32\ntoskrnl.exe
product: Microsoft® Windows®
Operating System company: Microsoft Corporation
description: NT Kernel & System
Bug check description: This indicates that a kernel-mode program generated an exception
which the error handler did not catch. This appears to be a typical software driver bug
and is not likely to be caused by a hardware problem. The crash took place in the Windows
kernel. Possibly this problem is caused by another driver that cannot be identified at this time.
On Wed 1/3/2018 9:00:13 AM GMT your computer crashed
crash dump file: C:\WINDOWS\memory.dmp
This was probably caused by the following module: ntdll.sys (ntdll!ZwFlushBuffersFile+0x14)
Bugcheck code: 0x1E (0xFFFFFFFFC0000005, 0xFFFFF8019CED183E, 0xFFFF968442FBEB68, 0xFFFF968442FBE3B0)
Error: KMODE_EXCEPTION_NOT_HANDLED
Bug check description: This indicates that a kernel-mode program generated an exception
which the error handler did not catch. This appears to be a typical software driver bug
and is not likely to be caused by a hardware problem. A third party driver was identified
as the probable root cause of this system error. It is suggested you look for an update for
the following driver: ntdll.sys.G
Google query: ntdll.sys KMODE_EXCEPTION_NOT_HANDLED
Memory dumps (full and mini) will be here, as they are available: https://1drv.ms/f/s!AhSzRvnavkrXhPpNy8Qjhaj6LbbTwQ
@magicandre1981 recommended chkdsk /f
based on the results of my memory dump. C:
is the only drive for which a pagefile is enabled (it's system managed), so that's the one I ran it on. Here are the results:
Checking file system on C: The type of the file system is NTFS.
A disk check has been scheduled.
Windows will now check the disk.
Stage 1: Examining basic file system structure ...
605184 file records processed. File verification completed.
Deleting orphan file record segment 699DD.
10717 large file records processed. 0 bad file records processed.
Stage 2: Examining file name linkage ...
14846 reparse records processed. 704776 index entries processed. Index verification completed.
0 unindexed files scanned. 0 unindexed files recovered to lost and found. 14846 reparse records processed.
Stage 3: Examining security descriptors ...
Cleaning up 1426 unused index entries from index $SII of file 0x9.
Cleaning up 1426 unused index entries from index $SDH of file 0x9.
Cleaning up 1426 unused security descriptors.
Security descriptor verification completed.
49797 data files processed. CHKDSK is verifying Usn Journal...
37651904 USN bytes processed. Usn Journal verification completed.
CHKDSK discovered free space marked as allocated in the
master file table (MFT) bitmap.
CHKDSK discovered free space marked as allocated in the volume bitmap.
Windows has made corrections to the file system.
No further action is required.
487284001 KB total disk space.
209659436 KB in 259738 files.
162276 KB in 49798 indexes.
0 KB in bad sectors.
729085 KB in use by the system.
65536 KB occupied by the log file.
276733204 KB available on disk.
4096 bytes in each allocation unit.
121821000 total allocation units on disk.
69183301 allocation units available on disk.
Internal Info:
00 3c 09 00 f0 b8 04 00 7e 93 08 00 00 00 00 00 .<......~.......
98 05 00 00 66 34 00 00 00 00 00 00 00 00 00 00 ....f4..........
Windows has finished checking your disk.
Please wait while your computer restarts.
No luck. Even after chkdsk fixed these issues, I'm still having the same crashes, though no new BSODs yet.
Another BSOD as I was opening the browser to update this question. Memdumps available once they finish uploading.
But the original reason I came to update is that I found a whole crapton (51 to be precise) of events that look exactly the same. It looks like they happened about every half-hour, starting right after I left for work (7:30am) until about 8:30pm. They might still be happening. They all look like exactly this:
Fault bucket 0x1E_c0000005_fltmgr!FltpPreFsFilterOperation, type 0
Event Name: BlueScreen
Response: Not available
Cab Id: 0
Problem signature:
P1: 1e
P2: ffffffffc0000005
P3: fffff8019ced183e
P4: ffff968442fbeb68
P5: ffff968442fbe3b0
P6: 10_0_16299
P7: 0_0
P8: 256_1
P9:
P10:
Attached files:
\\?\C:\WINDOWS\Minidump\010318-12546-01.dmp
\\?\C:\WINDOWS\TEMP\WER-18531-0.sysdata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER5795.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER57A5.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER57B6.tmp.txt
\\?\C:\Windows\Temp\WER8F12.tmp.WERDataCollectionStatus.txt
These files may be available here:
C:\ProgramData\Microsoft\Windows\WER\ReportQueue\Kernel_1e_b49232881f44bde28acca17f0ad8bac3b4fbb67_00000000_cab_031c57c4
Analysis symbol:
Rechecking for solution: 0
Report Id: 3c2abe43-d7d6-4561-9b0d-2adf1f40c745
Report Status: 388
Hashed bucket:
I have a hard time believing that the CPU would have this issue for so long, and the computer still be functional. I haven't had much success exploring software/configuration issues.
Any ideas?
Almost 3 weeks later.... After MUCH shenanigans, I finally acquire a new CPU (upgraded from Phenom II to FX-8350). Replacement was easy enough. Then probe common problem-areas, and apps are still crashing.
As soon as I posted "sad-face," Windows told me something about a "Device Health Report." It reports trouble with a driver. Unfortunately, but unsurprisingly, the Troubleshooter was unable to detect any kind of problem. I uninstalled the two "USB Root Hub" devices in error state from the Device Manager.
Does this provide any additional clues? I'm really at a loss, now...
Here is a list of driver information...? https://docs.google.com/spreadsheets/d/1xAliAOt1s8rQ_ePX5OwTRVFPB3kFYgc3-1HRUznMpR0/edit?usp=sharing
share the dmp files so that we can debug them – magicandre1981 – 2018-01-03T17:09:12.797
Will do! I'll add links to the main question as soon as they're available. – mHurley – 2018-01-03T18:02:48.710
Thanks for the extensive edit, @flolilolilo That's much easier to read, now. – mHurley – 2018-01-03T18:13:05.537
1analyzing the dump shows it crashes while doing volume shadow operation (CVssQueuedVolume::OnOpenVolumeHandle). so run chkdsk /f to check HDD file system for errors. – magicandre1981 – 2018-01-03T18:58:10.010
Excellent information! I'm still mystified that people can get that kind of information out of that file. Running chkdsk will certainly help with that problem, but is there good evidence to show that this is the cause of ALL the AppCrash events I've been seeing recently? – mHurley – 2018-01-03T23:58:31.337
Added
chkdsk
results to my question. – mHurley – 2018-01-05T03:26:30.920ok, chkdsk fixed NTFS issues. now wait if you get new crashes (BSOD or app crashes) – magicandre1981 – 2018-01-05T11:40:19.097
No luck :-( Still crashing. – mHurley – 2018-01-07T22:20:50.100
what crashes? BSOD or app crash? Which process? – magicandre1981 – 2018-01-08T16:35:17.497
AppCrash - Diablo III. I suspect there were others (browser acting weird, apps seemed not to load when I launched them), but I haven't had a chance to track them down, yet. – mHurley – 2018-01-08T21:20:48.837
Looks like there are many other Application Error events, but it's hard to tell. I literally opened the browser to update this question, when I had another BSOD. New MemDumps coming soon. Also, I have a lot of "info" events about the Bluscreen. I'll post an example in the question. – mHurley – 2018-01-09T03:12:54.490
could be HW issue, last dump shows (IP_MISALIGNED, MODULE_NAME: hardware). so yes, it could be the AMD Phenom(tm) II X6 1090T that fails. – magicandre1981 – 2018-01-09T17:34:44.690
:-( Sadface-making – mHurley – 2018-01-09T18:21:59.633
Alright, just BSOD. Last set of memdumps, just to see if there's anything interesting. Available at the usual link, once they've uploaded. – mHurley – 2018-01-11T04:41:37.267
last dump shows this : " *** Memory manager detected 1 instance(s) of page corruption, target is likely to have memory corruption." so still HW issue – magicandre1981 – 2018-01-11T16:21:36.857
Well... I guess that does it. If it's hardware, really the most likely candidate is the CPU. It's not really the answer I was hoping for, but I'm more confident now that it's the real answer. Thanks for all your help, guys. – mHurley – 2018-01-12T04:17:29.340
if you already changed motherboard and RAM the CPU could be the issue. look on ebay if you can find a x6 replacement CPU for small amount of money – magicandre1981 – 2018-01-12T16:32:15.420
:-( New CPU. Still crashing. No BSOD, yet, so IDK what kind of information I can get. – mHurley – 2018-01-26T02:11:13.957
...and suddenly, there's new information. See edit. – mHurley – 2018-01-26T02:17:29.877
share the new dumps – magicandre1981 – 2018-01-26T16:21:07.380
No BSOD, so no new dumps. Just AppCrash. I've been running
verifier
for the last 36 hours. Standard settings, no BSODs from that either. – mHurley – 2018-01-26T23:19:52.257Here's some driver info? https://docs.google.com/spreadsheets/d/1xAliAOt1s8rQ_ePX5OwTRVFPB3kFYgc3-1HRUznMpR0/edit?usp=sharing
– mHurley – 2018-01-26T23:23:57.910if you have no BSOD this is good. app crashes can be caused by a lot of other things. look in eventlog / Reliability Monitor for details about which applications crash: https://lifehacker.com/how-to-troubleshoot-windows-10-with-reliability-monitor-1745624446
– magicandre1981 – 2018-01-27T07:25:05.980New info... see edit. – mHurley – 2018-02-02T04:15:33.390
ok, so your board has stability issues when using all RAM modules. try to increase voltage of RAM a bit (only a small amount otherwise you kill the RAM) – magicandre1981 – 2018-02-02T05:20:05.273
Interesting... it never occurred to me that stock RAM on a stock board might need a voltage tweak. Could this be true even if I'm not overclocking anything? I've always been intimidated by these settings, before. What do you mean by "small?" – mHurley – 2018-02-03T14:16:34.480
only a small voltage increasement – magicandre1981 – 2018-02-03T16:34:31.837