Data Transfer Pauses on LSI 9271 RAID Controller

Question

I have a server equipped with a LSI 9271-8i RAID controller, with 4 x 4TB organized as RAID-5 and 1 x 8TB as JBOD (which is called RAID-0 in the controller).

When I copy bigger amounts of data (~1 TB), I can observe the following: for the first few gigabytes the transfer speed is fine and limited by the disk or network speeds, usually ~100MB/s. But after a while, the transfer completely pauses for approx. 20-30 seconds, and continues then with the next approx. 1 GB. I copy a lot of files with each between 10MB and 500MB, and during the pause robocopy stays at a file and continues to the next after the pause. That way the overall transfer rate drops to ~20MB/s.

During the pause, browsing the drives' files is not possible, and in one case I received an controller reset error message ("Controller encountered a fatal error and was reset"). Also accessing controller data with the CLI tool is not possible during that pause (the result is displayed when the pause is over).

I could observe this behaviour when copying

gigabit network to RAID-5 volume
gigabit network to JBOD volume
JBOD to RAID-5
RAID-5 to JBOD

There is nothing going on that looks suspicious to me: temperatures (disks, BBU) are within the valid range, controller temp seems a bit high, but also within specs. No checks are running on the RAID, no rebuild in progress.

Any guesses?

Before I replace the controller, I want to try optimizing the thermal situation. Does this behaviour sound like a possibly thermal issue?

I find it strange that the first 20-30 GB are working fine, and the pauses are not ocurring before that. If I leave the server alone for a while and retry, then again a few GBs are copied fine. The only naive explanation for me is that the controller gets too hot. Why the controller and not the disks? The RAID-5 disks are 7200rpm and stacked very closely, while the JBOD single disk is 5400rpm and with a lot of air around. Would be strange if both would show the same overheating symptoms.

Sounds really like a temperature issue. You should monitor controller temperature or try using a fan on the heatsink. Have you tried reading and writing large amounts of data separately? Writing will cause more heat and should lead to these stalls faster. — Zac67, Dec 12 '17 at 20:57
Thanks, Zac. I will try improving the air flow and attach a fan to the passively cooled controller. Checking the difference between reading and writing sounds interesting, will try that. I just could not find a tool yet to monitor the temperature properly on that controller. — Markus Erlacher, Dec 14 '17 at 09:56
Well, maybe just stick a remote thermometer onto the heatsink and watch it. ;-) — Zac67, Dec 14 '17 at 17:22

score 1 · Answer 1 · answered Apr 03 '21 at 00:15

I had a similar issue with a 9260-16i. It was not temps as I have dual 92mm fans blowing right on the LSI. I have a second server set up same way and it was fine. What I discovered was the server with the issues was set with a 64K strip size and working server had 256K stripe size. I backed up the problem server and rebuilt the drive group with 256K stripe and then formatted the OS drive with 64K clusters (since I have multi-GB file). I have been moving data back and no hesitations and basically running at full gigabit NIC speed on writes moving over 350GB per hour non-stop no pauses.

score 0 · Answer 2 · answered Mar 21 '21 at 17:45

0

The issue is probably related to the controller flushing out its own DRAM cache. Anyone having such issue should try setting the controller cache to writethrough rather than writeback

answered Mar 21 '21 at 17:45

shodanshok

44,038
6
98
162

Data Transfer Pauses on LSI 9271 RAID Controller

2 Answers2