6

We recently built a replicating SAN array from 2x Dell R720XD's, we are using LSI 9270-8i MegaRAID cards with CacheCade 2.0, BBU and Write Back cache enabled.

Our cards are showing HUGE chip temperatures (97*C+ with NO disk activity!).

Our R720's are in auto temp management mode so the max exhaust temp is 50*C.

The MegaRAID cards are passively cooled and depend on good airflow to cool them - however is 97*C normal? - I have seen reference to 60*C max ambients but nothing for chip temp.

Myles Gray
  • 639
  • 4
  • 12
  • 33
  • 5
    These cards are well known for running hot - so much so that some people add their own heat sinks onto them! – Michael Hampton Oct 02 '13 at 15:22
  • LSI 3108 chip embedded on Supermicro m/b - same problem. A 2x2 inch fan (6000 rpm) mounted on the chip aluminum cooler helped decrease the temperature up to 56 Celsius in IDLE mode. (Not tested in working mode.) –  Jul 01 '16 at 19:14
  • Anecdote: My LSI SAS 2308 card survived 24 hours with zero forced airflow, only natural convection on an open test bench before I noticed the issue and added a fan. The heatsink was painfully hot to the touch which implies that the junction temperature was well above 100C, yet the chip survived. The Arrhenius equation implies that if my chip could survive that long, most can survive 97C for decades with just the airflow from the server chassis fans – Navin Mar 11 '22 at 00:38

8 Answers8

4

http://www.lsi.com/downloads/Public/Host%20Bus%20Adapters/9206-16e_HBA_TemperatureAirflow_Application_Note.pdf

This seems to give some idea about temperature ranges, although for a different chip. ~100°C is high and dangerously close to the limit but still within spec. I have a similar issue with a 9201-16i card. These chips have a 2000000 hours MTBF but at such high idle temperatures I doubt they can last :(

I considered replacing the heat-sink with one from a retired video card. If anybody had succeeded in removing it safely from the chip would be nice to write a few lines about the procedure. Looks to be glued with some epoxy, not just a thermal compound easily detachable. This results in a high risk of breaking the BGA itself or the soldering to the board...

htk
  • 41
  • 3
  • That App Note you've linked to is for the 9206, a completely different chip (the LSI HBAs and MegaRAID cards are completely different, designed by different companies originally, still very different beasts). – Chris S Jun 06 '14 at 15:13
  • Indeed, it is a different chip but it's the closest reference I had at hand that also correlates to LSI's 115°C response quoted in this thread. The chip in the app note is SAS2308 while the MegaRAID here is SAS2208 and 9201-16i uses SAS2116 (to my rather fuzzy current knowledge). – htk Jun 06 '14 at 16:21
  • Update to 9201-16i card issue; The heat-sink was not connected with epoxy, only some old and hardened thermal compound. With a sharp and hard blade I was able to remove it. The card has now an old copper heat-sink from a graphic card complete with fan and speed control which keeps it very cool and still very silent :) – htk Jun 07 '14 at 19:32
3

Well, LSI's response was hardly a solution or even useful:

I do not see anything in the logs that might indicates an overheating of the controller card. The 97 degree Celsius is still withing the range of the temperature threshold of the ROC which will be 115. The main temperature to watch will be the ambient temperature inside the server which requires at least 200 LFPM of airflow from the fans to stay at the required threshold. Please find these required conditions below.

For the MegaRAID SAS 9270-8i RAID controller, the operating (thermal and atmospheric) conditions are as follows:

Relative humidity range is 20 percent to 80 percent noncondensing.

Airflow must be at least 200 linear feet per minute (LFPM) to avoid operating the LSISAS2208 processor above the maximum ambient temperature.

Temperature range: — +10 °C to +45 °C (with BBU) — +10 °C to +55 °C (with LSIiBBU09 mode 1 through 5)

The parameters for the nonoperating (such as storage and transit) environment for these controllers are as follows:

Relative humidity range is 5 percent to 90 percent noncondensing.

Temperature range: – 40 °C to +70 °C (without BBU) — 0 °C to +45 °C (with BBU

Thank you

Our unit's are providing 240LFPM of airflow on low-fan speed setting, ambients are 18*C and chassis temp is much the same - appears they aren't going to admit this is a manufacturing fault - no silicon should ever run at this temperature at idle.

Myles Gray
  • 639
  • 4
  • 12
  • 33
3

While I can't explain why you are idling so high, I can say this is a problem we have been dealing with for years on our servers. A server we just recently retired with a MegaRAID 9280 controller idled around 87 degrees. What was really a pain was that the battery backup unit would constantly overheat and fry out in about a year because of the heat. Since the replacement units were nearly $300, this added what seemed like an unnecessary amount of risk and expense to our TCO.

We seem to have finally resolved the issue by purchasing a cheap Antec Cyclone blower; basically a single slot fan that sits facing the heat sink of the RAID card and constantly blows hot air out the back of the box. We found that it lowered the idle temp of the card by a whopping 26 degrees Celsius. There are a number of similar products on the market to choose from, and can generally be purchased for under $10.

3

I know this is a very old post, but since its the top result in a google search I thought I'd chime in for somebody else's benefit. The LSI 9270 series RAID cards are designed to have high concentrated server-style airstreams blown past them at relatively high velocities. For example at idle in my tower-style server with the front, back and top fans going my 9270-8i idles at 89C. When I rigged a Noctua 60mm fan to blow at right angles to the heatsink (by screwing the fan into heatsink of a nearby video card) it dropped sharply to 59C at idle and 70c under load, when I replaced its dried up heatsink grease with IC Diamond the idle temps dropped to 54C and load temps to 59C.

DaveH
  • 31
  • 1
1

I was seeing 93C on idle too. I just replaced the stock thermal paste. It was really hard and dry. Now I am seeing 58C.

BeerMan81
  • 11
  • 1
1

That's really not normal no, what's weirder is that they're both doing it - must be a firmware/driver thing I guess - have you spoken to LSI or Dell about it?

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • No I haven't called them - to upgrade the LSI firmware they require you to do it through OS, what kind of crap is that! I guess I can whack the fans on the R720's up full until I can upgrade the firmwares this afternoon, hopefully they won't have baked themselves into oblivion before then. – Myles Gray Sep 26 '13 at 11:33
  • 1
    Updated firmware to latest - same story - 97*C with no load - R720 fans on full ROC temp = 55*C. This thing is WAY too loud with the fans up full and I shouldn't have to do this for the card not to set itself on fire! – Myles Gray Sep 26 '13 at 17:41
0

To add more information.

I put a LSI 9286CV-8e w/ BBU in a Dell R610. It was running what I thought very hot at 75-85c. I had another identical card and it ran in the same Dell at 55-60c. I took the LSI heatsink off (careful no screws just push pin platic retainers) And applied Artic Silver 5 to the entire cpu (almost too much) put the heatsink on and it ran at 55c in the Dell R610.

Now Dells bios will recognize you have a card in the rear PCIx slots and will ramp up the fan speeds from around 5k to 7.2k rpm, enough to be noisey.

If you want to lower the Fan speed a trick is to run the command prompt "racadm racreset soft"

This will reset the drac and it will bring the fans back up to 4800 rpm and 3800 rpm for the other half of fans. Less noise and the LSI card does not run much hotter 60-63c on the lower

I've had no issues running the card 55c-65c range with slow fan speed.

So if your LSI card is running Hot first thing to do is remove the heatsink and look at the past. Mine was so dry it was powder and cracked.

0

Same issue with LSI 9361. 63 is temp with 2x2 cm cooler on top of sink. Same for 12GB expander. This is over specs in winter. Problem here is that now these cards are 2 slots big. That's no go.

There are single slot graphic cards, aluminium sink size of whole card with a fan, I wonder if Broadcom will ever accept modern designs, because this one is 10 years old at least.

RadianHeatSinks can provide custom size sinks with fan at desired location. Since they need card to measure it precisely and drill, never sent my cards there but that is the only serious approach to this problem.

I have 3 of HBAs, 2 RAIDs, 2 Expanders in 3 servers, and achieving 200 LFPM efficiently spread over 2-3 cards is too hard.

Peter
  • 146
  • 5