New computer can't run Linux, but runs Windows with no problems

3

I have a brand-new gaming laptop which crashes every time I run Linux. If I run Linux natively, it completely freezes (including the mouse cursor) after a seemingly random period. If I run Linux in VMWare Player under Windows 8.1, Linux eventually hangs in the same way, but Windows is also crashed and displays a Blue Screen of Death (BSOD) after a short delay. The BSOD always says MACHINE_CHECK_EXCEPTION and the BugCheck log indicates a code of 0x9c.

The Linux variants that I have tried are:

  • Ubuntu MATE 15.10 64-bit
  • Ubuntu 15.04 64-bit
  • Ubuntu MATE 15.04 64-bit
  • Ubuntu MATE 14.04.2 64-bit
  • Ubuntu MATE 14.04.2 32-bit

Other than these more-or-less random hangs, Linux runs fine -- and I have been able to use it for many hours in between crashes.

I had assumed that this is a hardware problem, but the difficulty is that I cannot get Windows to crash unless I am also running Linux (in a VM). I've tried simultaneously launching every available application (around 30), while playing YouTube videos, and also running stress-test apps such as Prime95. I've also done some graphics-heavy gaming.

I have run "Windows Memory Diagnostics Tool" and other memory tests with no apparent problems.

One guess is that Linux is somehow exercising CPU features that Windows doesn't use, but it isn't clear why this would trigger random hardware failures.

How can I definitively prove that I have faulty hardware (or that I don't)?


EDIT: I seem to be having some luck fixing the Linux problems by disabling some features in the BIOS. I haven't seen any crashes since doing so. The changes I made initially (just based on guessing):

  • Virtualization Technology: Disabled
  • Fast Boot: Disabled
  • SpeedStep: Disabled
  • PCI Latency Timer: 64 Clocks (was 32)

Based on subsequent testing of variations of these, apparently both VT and SS need to be disabled -- but for sure, at least SpeedStep. Does this make it easier to isolate the crashes as being based on a hardware defect? ...Or could this possibly be a software problem in Ubuntu/Linux?


To make my question more explicit: I'm not really asking for ways to solve the problem, although that would be great in theory. What I really need is a way to isolate and reproduce this problem under Windows without also running Linux. I'm working from the assumption that I have a bad unit -- and I just need a way to prove it. Remember that the machine is crashing whenever I run Linux (excepting the BIOS changes mentioned above), so this can't be solved by simply updating Windows drivers.

In short: Knowing that Linux causes crashes, is there any other stress-test that I can run, in Windows, that might cause the same type of crash? Alternatively, is this a known bug in Linux?

Note that my processor is the newish i7-5700HQ (Broadwell microarchitecture).

Also note: I don't believe this is caused by severe overheating. The machine includes an extra fan that can be manually enabled, and the crashes don't seem to correlate with heavy loads.


UPDATE: The problems with running Linux natively have been resolved by installing a BIOS update that became available a few months after I posted the question. I am also now running Ubuntu MATE 15.10, but I don't think that matters since that also failed prior to the BIOS update. I guess the long and short of it is that the system was not compatible with Linux (or vice versa) as they were at the time of release.

I haven't gone back and retested the virtual machine problem since I don't really need that now that I can run Linux natively -- and also I have migrated from Windows 8.1 to Windows 10, so it wouldn't exactly be an apples-to-apples test anyway.

nobar

Posted 2015-07-09T01:52:19.950

Reputation: 530

Ok, now I have tested (and crashed) with vanilla Ubuntu 15.04. – nobar – 2015-07-09T02:38:18.303

What's the machine? – Journeyman Geek – 2015-07-09T03:20:25.847

@JourneymanGeek: MSI GE72 APACHE PRO-077 – nobar – 2015-07-09T03:30:39.577

1

Very similar situation found by searching with linux broadwell speedstep: Working Around The Intel Core i7 5775C Broadwell Stability Issue On Linux. The indicated workaround seems to relate to disabling "down-clocking" in the BIOS.

– nobar – 2015-07-09T18:27:56.873

Same laptop with same errors. Is it working for you after all? People still having errors here: http://ubuntuforums.org/showthread.php?t=2284315&page=2

– gabrielhpugliese – 2015-09-08T02:57:50.047

@gabrielhpugliese: Thanks for the link. I still think the fixes that I posted work, but I have been running Windows on this computer for the last couple of months, so I don't have any new data -- other than the observation that Windows still doesn't crash. – nobar – 2015-09-08T05:02:13.213

I'm using Virtualbox to run Ubuntu 14.04.1 with your tips (Virtualization enabled, FastBoot disabled, SpeedStep disabled and PCI latency timer 64). So far so good, I'll keep the link updated. – gabrielhpugliese – 2015-09-08T13:20:52.603

Early results indicate all better (with SpeedStep enabled) on Ubuntu MATE 15.10 64-bit. Fingers crossed... – nobar – 2015-10-23T17:27:55.153

My previous comment turned out to be false -- it was still failing, at least under some challenging usage scenarios. I did just discover that a new BIOS is available, so I have just upgraded MicroCode from 0xd to 0x13. After this, I am passing the test that was previously failing... – nobar – 2015-11-01T19:38:56.833

Answers

1

From service.msicomputer.com:

Why do I get a BSOD MACHINE_CHECK_EXCEPTION?

If you are experiencing a blue screen error when opening any Office 2016 applications, certain games, and virtual desktop Software, showing a "MACHINE_CHECK_EXCEPTION" or "CLOCK_WATCHDOG_TIMEOUT" BSOD's. This is caused by bug in the Microcode affecting only Broadwell CPU's (5th Gen) and it is resolved by updating the Microcode via a BIOS update from the versions listed below.

...

Last Update: September 30th, 2015

nobar

Posted 2015-07-09T01:52:19.950

Reputation: 530

It works. Adicional note: "Boot mode select" showld be changed from "UEFI" to "LEGACY". – SandroMarques – 2016-04-27T10:34:57.977

1

This is a hardware issue, 9C the parameters have different meanings depending on what type of CPU you have. Most commonly it results from overheating, from failed hardware - CPU, RAM, power supply, etc. Pushing hardware beyond its capabilities such as overclocking can cause that error too.

Check the hardware settings in your bios, starting with the RAM. Ensure there's no overheating nor overclocking.

Also try to uninstall/not to install programs that came with your motherboard.


If this does not solve the problem, You should try a few more steps:

  1. Ensure that the machine is adequately cooled. If there is any doubt, open up the side of the PC case -if possible- (be mindful of any relevant warranty conditions!) and point a mains fan squarely at the motherboard. That will rule out most (lack of) cooling issues.

  2. Update all hardware-related drivers: video, sound, RAID (if any), NIC... anything that interacts with a piece of hardware. It is good practice to run the latest drivers anyway.

  3. Update the motherboard BIOS according to the manufacturer's instructions. Their website should provide detailed instructions as to the brand and model-specific procedure.

  4. Attempt to (stress) test those hardware components which can be put through their paces artificially. The most obvious examples are the RAM and HDD(s). For the RAM, use the in-built memory diagnostics (run MDSCHED) or the 3rd-party memtest86 utility to run many hours worth of testing. For hard drives, check whether CHKDSK /R finds any problems on the drive(s), notably "bad sectors". Unreliable RAM, in particular, is deadly as far as software is concerned, and anything other than a 100% clear memory test result is cause for concern. Unfortunately, even a 100% clear result from the diagnostics utilities does not guarantee that the RAM is free from defects - only that none were encountered during the test passes.

  5. Clean and carefully remove any dust from the inside of the machine. Reseat all connectors and memory modules. Use a can of compressed air to clean out the RAM DIMM sockets as much as possible.

  6. If all else fails, start removing items of hardware one-by-one in the hope that the culprit is something non-essential which can be removed. Obviously, this type of testing is a lot easier if you've got access to equivalent components in order to perform swaps. In your case, the RAM and HDD probably can be swapped.

Divin3

Posted 2015-07-09T01:52:19.950

Reputation: 1 568

@nobar - So how is it going? Does the problem still persist? If Yes, leave a comment and I will do some additional research. If it is solved and was solved by my answer, You can accept this as answer. If You solved it with another method, than You should answer your own question. – Divin3 – 2015-09-10T03:40:52.390

1

add libata.force=noncq to the grub boot param. Works like a charm. My problem is the video drivers, not running in a vm, but actual dual boot.

Ramsez

Posted 2015-07-09T01:52:19.950

Reputation: 11