Random multiple BSODs

3

A few days ago just out of the blue (quite literally) I got Blue Screen of Death. And since then it happens quite often, like 10 times a day (damn, it even happened while writing this). I can't really see any pattern, so it looks pretty random to me.

Every single time the cause is CRITICAL_STRUCTURE_CORRUPTION. And no, I don't have Intel HAXM installed:

HAXM

Here is a screenshot from BlueScreenView. BlueScreenView

So as you can see it's always caused by ntoskrnl.exe, always at the same address.

What I already tried, was:

  • updated as many drivers as I could possibly find,
  • checked both disk (ST1000lM024 HN-M101MBB) partitions for errors with chkdsk, some were found on non-system partition:

    Chkdsk was executed in scan mode on a volume snapshot.  
    
    Checking file system on E: Volume label is Dane.
    
    Stage 1: Examining basic file system structure ...
    795136 file records processed.
    File verification completed.
    83 large file records processed.
    0 bad file records processed.
    
    Stage 2: Examining file name linkage ...
    956638 index entries processed.
    Index verification completed.
    
    Found 3 missing entries (\Gry\Neverwinter Nights 2 Complete\UI\default\images\generic\tint_frame_BL.tga <0x1,0x8f0c0>, ...) in index "$I30" of directory "\Gry\Neverwinter Nights 2 Complete\UI\default\images\generic <0x1,0x8ef0b>"
    ... repaired online.
    
    Stage 3: Examining security descriptors ...
    Security descriptor verification completed.
    80752 data files processed.
    CHKDSK is verifying Usn Journal...
    Usn Journal verification completed.
    Windows has found problems and they were all fixed online. No further action is required.
    
     847746047 KB total disk space.  381611132 KB in 707401 files.
        185180 KB in 80753 indexes.
        887123 KB in use by the system.
         65536 KB occupied by the log file.  465062612 KB available on disk.
    
    4096 bytes in each allocation unit.
    211936511 total allocation units on disk.
    116265653 allocation units available on disk.
    
    ----------------------------------------------------------------------
    
    
    Stage 1: Examining basic file system structure ...
    
    Stage 2: Examining file name linkage ...
    CHKDSK is scanning unindexed files for reconnect to their original directory.
    Recovering orphaned file tint_frame_BL.tga (585920) into directory file 585483.
    Recovering orphaned file tint_frame_BL.tga (585920) into directory file 585483.
    Recovering orphaned file tint_frame_BR.tga (585921) into directory file 585483.
    Recovering orphaned file tint_frame_BR.tga (585921) into directory file 585483.
    Recovering orphaned file tint_frame_R.tga (585923) into directory file 585483.
    Recovering orphaned file tint_frame_R.tga (585923) into directory file 585483.
    
    Stage 3: Examining security descriptors ...
    
  • checked disk with DiskCheckup test. Normally I tried running it twice and each time BSoD occured. Then I ran extended test in safe mode during all night - no errors were found, but no BSoDs occured neither. SMART attributes:

SMART values

MemTest86Report

  • dxdiag available here (Display Devices shows Intel HD Graphics 4000, but there is also AMD Radeon HD 8870M, which are supposed to switch when needed).

  • WinDbg analysis using !analyze -v:

    CRITICAL_STRUCTURE_CORRUPTION (109)
    This bugcheck is generated when the kernel detects that critical kernel code or
    data have been corrupted. There are generally three causes for a corruption:
    1) A driver has inadvertently or deliberately modified critical kernel code
     or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
    2) A developer attempted to set a normal kernel breakpoint using a kernel
     debugger that was not attached when the system was booted. Normal breakpoints,
     "bp", can only be set if the debugger is attached at boot time. Hardware
     breakpoints, "ba", can be set at any time.
    3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.
    Arguments:
    Arg1: a3a01f59f91f75c5, Reserved
    Arg2: b3b72be04b9e2890, Reserved
    Arg3: ffffe0015a092ed0, Failure type dependent information
    Arg4: 000000000000001c, Type of corrupted region, can be
        0 : A generic data region
        1 : Modification of a function or .pdata
        2 : A processor IDT
        3 : A processor GDT
        4 : Type 1 process list corruption
        5 : Type 2 process list corruption
            6 : Debug routine modification
        7 : Critical MSR modification
    
    Debugging Details:
    ------------------
    
    CUSTOMER_CRASH_COUNT:  1
    
    DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT
    
    BUGCHECK_STR:  0x109
    
    PROCESS_NAME:  System
    
    CURRENT_IRQL:  2
    
    STACK_TEXT:  
    ffffd000`366fc088 00000000`00000000 : 00000000`00000109 a3a01f59`f91f75c5 b3b72be0`4b9e2890 ffffe001`5a092ed0 : nt!KeBugCheckEx
    
    STACK_COMMAND:  kb
    
    SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE
    
    FOLLOWUP_NAME:  MachineOwner
    
    MODULE_NAME: Unknown_Module
    
    IMAGE_NAME:  Unknown_Image
    
    DEBUG_FLR_IMAGE_TIMESTAMP:  0
    
    BUCKET_ID:  BAD_STACK
    
    Followup: MachineOwner
    

All the dump files are available here (disabled sharing after @nullmem comment). Do you have any ideas how could I prevent these BSODs from happening?


Here is my laptop specification (Samsung Chronos 7).

alex

Posted 2014-05-10T11:23:58.790

Reputation: 550

Have you got a C:\Windows\MEMORY.DMP file? As you said the problem happens quite often; do you get the same issue after performing a clean boot? What about safe mode? Did you check the disk for errors already? Booting any Linux live distribution might be worth a shot. Also, some drivers you can update: Wi-Fi, Ethernet, and audio.

– and31415 – 2014-05-10T13:05:52.290

What's the disk model? – and31415 – 2014-05-10T18:33:09.110

@and31415 disk model is ST1000lM024 HN-M101MBB. I updated my question including more details. – alex – 2014-05-11T12:35:01.577

I strongly discourage posting a publicly downloadable copy of your dump files. This can be a huge security risk to your system and can compromise the security of any server accounts you access on a regular basis. – nullmem – 2014-05-11T15:15:38.790

Answers

2

Alex,

I have the almost exact laptop you have, but the 15.6" version. Beautiful laptop and especially fast (since I put a 1TB SSD inside and upped the RAM to 16GB)

I feel you should still focus on the physical hard drive, here's why:
1. Errors that are random, as you have, are mostly a physical hard drive issue, power supply related, motherboard capacitor related or memory induced.
1a. Motherboard and power supply repair is OUT in this situation. Ignore those.
1b. After extensive memory test, your results imply you can probably rule that out too ( I don't particularly use Memtest 86).
2. The type of Chkdsk errors you saw go beyond file normal corruption from an unexpected shutdown, ESPECIALLY the non-system partition file errors.
3. The diskcheck is NOT clean. At first glance it looks okay, but you have 201 occurrences of Hard drive calibration retry count, which is an internal function of the hard drive apart from the operating system.
4. Same with 201 occurrences of Load/Unload retry count which correspond with the drive calibration retry.
5. There are 109 occurrences of Write error count, where the hard drive could not properly write the data due to drive problem, could be media related, controller board (not as likely here) or drive heads.
6. This drive has, on average, been on for 2 hours each time it's turned ON and it's been turned on 1,337 times. If I would guess, I'd say you bought this laptop new 1.5 to 2 years ago.

This is what I would do, If I had your computer:
1. Move all data and downloaded programs to an external drive.
2. Don't forget to log all serial/registration numbers of software products.
3. Buy and install an SSD and install Windows 8.1 on it. (The laptop bottom will separate but be very very gentle with it. Check YouTube videos to see it done. It takes patience so you don't crack it).
4. Reinstall, fresh, everything. I know this HURTS, but you have to do what you must do.
5. Wipe the old hard drive, use it for non-important backups, until it begins to really die.
With that being said; If you cannot afford another drive or don't want to reinstall then,
1. Perform steps 1 and 2 above, then use Windows 8 recovery to refresh the laptop, keeping the old hard drive. At this point, I feel even SFC /scannow will not fix it permanently, since it's not a file problem.

I did not mention bad video chip in the switchable ATI video card onboard since this laptop uses the alternate internal Intel 4000 chipset unless in high performance mode or unless forced ON.

We, first, have to deal with a known problem before examining other possible problems..

DaveM

Posted 2014-05-10T11:23:58.790

Reputation: 31

1

I'm 95% sure you need to level up again in Neverwinter Nights 2. (grin)

Based on your Chkdsk results, your hard drive is telling me it is heading south.

Make sure you have a backup first, then download and run Passmark Diskcheck program at http://www.passmark.com/products/diskcheckup.htm, free for personal use. Check SMART values and run a disktest within the program. It should tell you what you need to know.

Hard drives, despite being new, do fail, although usually inside it has been degrading first. I went thru 4 SATA II hard drives within 2 years on a brand new laptop once, not fun.

And you might wanna check your Neverwinter Nights status. :-)

DaveM

Posted 2014-05-10T11:23:58.790

Reputation: 31

(un)fortunately no errors were found during extended disk test. Please check my updated question for details :) – alex – 2014-05-11T12:33:22.437

@alex - Does not change the fact a healthy working hdd does not fail chkdisk. The only other possibility is a driver that isn't loaded in safe mode. Can you publish the dxdiag for us? – Ramhound – 2014-05-11T14:49:13.890

@Ramhound done. – alex – 2014-05-11T15:25:30.800