How does a multi core computer freeze (at the hardware level)

7

1

I have a 4 core i7 computer that freezes. The display stays, but nothing will ever move again. This question is not about getting help on that particular problem, but a general question about how a computer can freeze.

And it is not about blue screens either. I am talking about a sudden, complete halt of the system. Although one can never be sure, here are what I mean by completly frozen :

  • Incator ligths on the keyboard (like caps lock) no longer toggle
  • Purpose built software that blinks an icon in the system tray no longer updates
  • No input possible (mouse, keyboard and power button) unresponsive
  • can't ping or WOL the computer
  • Music (read from network or localy stops)
  • Bluetooth radio no longer responsive
  • Closing and opening the cover has no effect
  • Will stay that way for hours and CPU stays somewhat cool (I can't reach it)

Way back when, your signle CPU could halt if it encountered an unexpected situation. Maybe an unknown opcode. The comptuter would suddenly freeze. If you had an ICE debugger attached to it, you could see the trace that led to the frozen CPU. I've seen that (too) often with Z80, 6800 and 8086 CPU.

With multiple cores, why can't the computer run on the remaining cores, if only to write a core dump ? In other words, what other single point of failure are there on a multi-core computer ?

ixe013

Posted 2011-10-27T13:21:32.600

Reputation: 738

1This sounds more like a software glitch. The number of cases where the CPU actually halts are very small, and in those cases, the operating system is suppose to crash. – Ramhound – 2011-10-27T13:58:17.040

4This question is based upon the false premise that "nothing works in the user interface" is the same as "the CPU has ceased executing instructions". You haven't demonstrated that the latter has actually occurred. You have merely witnessed the former and jumped to a conclusion from no evidence. Asking why something happens, that you don't even know to have happened in the first place, is a loaded question. – JdeBP – 2011-10-27T14:06:51.593

@JdeBP : I didn't list all the tests I did before posting my question because I'm not asking for help with this computer, I want to understand the general issue. But since you ask, I had a feeling that the computer is totally frozen because the caps lock light woulnd't turn on, meaning that the hardware interrupt was not being processed. I don't have a multi core ICE debugger to confirm without a doubt that it is frozen, but all signs point in that direction. – ixe013 – 2011-10-27T15:23:30.860

1Again, you're jumping to an unfounded conclusion about hardware interrupts. The hardware interrupt for a keyboard device, certainly in the operating system that you're obliquely talking about, merely reads the data received from the keyboard. Deciding whether to alter the state of the caps lock light is not done within an interrupt handler. So the fact that it doesn't happen doesn't say anything at all about interrupt processing. For all that you know, the raw input thread could be simply stuck, or starved of CPU time, and everything else is in fact running as usual. – JdeBP – 2011-10-27T18:57:35.273

@JdeBP I know very little, indeed. I was hoping to get some insight with answers to this question. The computer freezes many times a day, can you thing of a test that could help understand that behavior ? I write kernel drivers for a living - not keyboard drivers, obviously ;) If you can think of something, I will write the code. – ixe013 – 2011-10-27T19:22:14.023

I just thought of something : I should run the system under the kernel debugger ! – ixe013 – 2011-10-27T19:23:50.933

Answers

2

Given the description of the freeze you're describing, it does sounds like a hardware-level issue, however not necessarily caused by the CPU. That said, a multi-CPU system can definitely tangle itself into a deadlock on all cores, if each is running a thread or process that are each waiting on a resource the other thread/process has allocated. A search on "CPU deadlock" provides lots of details on possible conditions. A failure due to overheating or improper voltage settings could also cause intermittent behavior - although I've only seen systems shutdown or refuse to POST when this is the case.

FYI - I've seen similar problems on systems with bad memory sticks, and bad video cards. You might try running some burn-in diagnostics such as MemTest+, and/or benchmarking the system with different pieces of hardware removed to see if you can isolate the unstable component(s).

holtavolt

Posted 2011-10-27T13:21:32.600

Reputation: 196