5

Once in a blue moon, I am seeing a blue screen of death on a shiny new Dell R7610 with a single 1100 Watt Dell-provided power supply on a beefy UPS. BCode is 101 (A clock interrupt was not received...), which some say is caused by under-volting a CPU.

Naturally, I would have to contact Dell support, and their natural reaction would be to replace a motherboard, a power supply, or CPU, or a mixture of the above components.

In synthetic benchmarks, system memory and CPU, as well as graphics memory and CPU perform admirably, staying up for hours and days.

My questions are:

  1. Is power supply good enough for the application? Does it provide clean enough power to VRMs on the motherboard?
  2. Are VRMs good enough for dual Xeon E5-2665?
  3. Does C-states logic work correctly?
  4. Is there sufficient current provided to PCIe peripherals, such as disk controllers?

P.S. Recently, I've gone through the ordeal with HP. They were nice and professional about it, but root cause was not established, and the HP machine still is less than 100%, giving me a blue screen of death once in a couple of months.

Here's what quick web-searching turns up: http://www.sevenforums.com/bsod-help-support/35427-win-7-clock-interrupt-bsod-101-error.html#post356791


It appears Dell has addressed the above issue by clocking PCIe bus down to 5GT/sec in A03 BIOS. My disk controllers support PCIe 3.0, meaning that I would have to re-validate stability. Early testing shows improvements.


Further testing shows significant decrease in performance on each of the x16 slots with Dell R7610 with A03 BIOS. But now it's running stable.

HP machine has received a microcode update in September 2013 SUM (July BIOS) that makes it stable.

GregC
  • 879
  • 2
  • 8
  • 24
  • 1
    In both of these systems, are you receiving any valuable information from the management processor (DRAC or ILO) or system logs (SEL or IML)? If you don't have the management agents installed for the respective hardware, the BSOD won't trigger the watchdog timer. – ewwhite Aug 28 '13 at 13:55
  • In the HP/iLO, the errors were pointing to both processors, so motherboard was the next suspect. – GregC Aug 28 '13 at 13:57
  • I am using both computers locally. Dell R7610 is a Precision Workstation. – GregC Aug 28 '13 at 14:00
  • I assumed that was a typo, as an R610 and R710 are actual server products. – ewwhite Aug 28 '13 at 14:05
  • This is a new rack mount workstation product that blurs the lines between server and workstation. I generally like it for our application, apart from the issue above. – GregC Aug 28 '13 at 14:08

1 Answers1

6

The DRAC is your friend usually the logs are obvious. If you still have warranty open a Dell Chat, ask for the DSET link, run it, and tell them to ship you parts. The diagnostic process will be:

  1. Complain to Dell, run a DSET (Diag tool), have them ship you parts
  2. Replace parts
  3. If !satisfied go to step 1, else continue
  4. Be happy.

Don't waste your effort thinking it through. It's Dell's dime. Just change and complain. It's not like the parts take long to change. You will eventually get it right (as well as a ton of new parts). If you're too lazy to do it pick up a contractor on Amazon Mechanical Turk for $3/hour to do it for you.

Dell is very nice. If you feel it is the PSU tell them that and they will usually ship you one of those. You can always say it sparked, arced, or smoked and that's a guaranteed PSU replacement

oooooo3333
  • 182
  • 2
  • 7
  • Amazon Mechanical Turk, cute. I'll go through the motions. – GregC Aug 28 '13 at 14:07
  • I'm inclined to agree with this stance. I mean technically if you wanted to get completely nerdy about it, you could break out the voltmeter and oscilloscope, but from a "Professional Systems Administration," standpoint, which is what this site is about, just go through your vendor's support channels and have them replace the parts as needed. – Ryan Ries Aug 28 '13 at 14:24
  • 1
    I wouldn't even know where to begin with the voltmeter and oscilloscope. They don't publish specs or diagnostic manuals on that stuff anymore. When I look at a motherboard it either A works or B goes to /dev/null. I've shipped a home MOBO off to get repaired once. I don't even think the professional company "knew what they are doing". They just re-balled the on board GPU similar to what people do to get rid of the Xbox red ring of death. Do you know of any sites that publish specs and diag material for mobos. It would be interesting to look at. – oooooo3333 Aug 28 '13 at 14:28
  • @RyanRies - sorry forgot mention you above, the comment was directed at you. – oooooo3333 Aug 28 '13 at 14:35