4

There are multiple questions here - but it starts with this: we've a Dell PowerEdge R710 with a PERC 6/i RAID controller (or controllers) in a RAID10 configuration.

The system is running Ubuntu Server 10.04 LTS with MySQL doing a read-intensive workload.

I increased readahead using blockdev --setra ### /dev/sda to increase readahead (the reads are, at least in theory, sequential reads). This does not seem to have had a significant impact. I've not changed the disk elevator (I've seen noop and deadline recommended).

The load on the system skyrockets and it appears to be related to disk I/O waits. The system can be waiting up to 50% of the time for disk I/O - while CPU % is at about 7-10%. A comparable system with a RAID5 and a write-intensive MySQL installation smokes this system entirely.

The RAID10 system appears to have two PERC 6/i controllers given what Dell OpenManage reports; however, only Controller 0 has an enclosure and only Controller 0 has the RAID on it. The RAID is made up of four disks (slots 0-3 I believe) with two free slots.

The system is also running in a PowerSaving profile that lets the operating system manage the CPU speeds.

The system is also afflicted with the fsync() bug found in some Linux kernels.

Lastly, the PERC 6/i is reporting that the firmware is out of date: it has 6.2.0-0013 and wants 6.3.0-0001.

Now the questions:

  • Is it possible to move one part of the RAID10 array to a second controller?
  • Are there actually two controllers that can be used in the same backplane or am I missing something?
  • Would a firmware update fix the disk speed issue?
  • Would the RAID level have anything to do with the large disk IO wait?
  • How much of an effect would the PowerSaving mode have? (Some reports seem to say it slows the kernel down.)

I strongly suspect that there is some kind of configuration that will zap the disks into frighteningly high speeds, but I can't seem to pin it down.

Update: The four disks used here are the Hitachi HDS721010CLA332 model, which is listed as having a SATA "Bus Protocol" but having a "SAS Address" as well? Are these disks those SAS-impersonating drives I've heard about that are supposed to be quite slow? In any case, these are 7200 RPM drives apparently.

The comparison system has SAS drives in it: the Seagate ST31000640SS - also 7200 RPM. This comparison system also has both RAID controllers utilized and with "backplane" entries associated with them.

Mei
  • 4,560
  • 8
  • 44
  • 53

4 Answers4

2

The PERC 6/i is a dual-port controller; each port has 4 SAS lanes. On the 8x2.5in R710 chassis, that's a one-to-one mapping of front-panel disks to SAS lanes. On the 3.5in chassis, ports 6 and 7 are unused. With a 4-disk array, you could move 2 disks to slots 4 and 5 to split the workload between channels, although there's still the single processor and memory on the PERC card.

Updating firmware is typically a good idea, and is a fairly painless process (although it does require a reboot.)

techieb0y
  • 4,161
  • 16
  • 17
  • Why is there no backplane associated with the second controller in the R710 in question? Does the first (older) R710 only have a single enclosure? Does the use of SATA disks vs. SAS disks make a difference here? – Mei Aug 19 '11 at 00:38
  • Is the "missing" backplane related to the fact that the drives are SATA drives and not SAS drives? I was reading that SATA drives didn't support multipath I/O and now just "fake it" somehow. – Mei Aug 19 '11 at 00:56
1

4 disk RAID 10 gives you performance of 2 disks for writes and 4 disks for reads (absolutely best-case scenario). A 7200 rpm HDDs should give 75-100 IOps. What kind of performance do you see? Do you read %util close to 100 in iostat?

If the primary load is generated by a database, what makes you think it is going to be mainly sequential? Databases are the stereotypical random access case. You can use iostat to see average request size. collectl will additionally give you information on I/O merges done in the kernel. Does it agree with your expectation of mainly sequential reads?

What fsync() kernel bug do you mean?

What filesystem do you use? What mount options? noatime option can buy you noticeable speed up on ext[34], because modification of access time can mean extra write for every read of a file (worst case, high-res timestamps).

Answer section ;)

Firmware update may help, but do not expect miracles. You may gain couple percent, not

RAID 10 is the best level for performance (if you want to keep redundancy), so it shouldn't cause problems in and of itself. However, you may have partitions and / or LVs not aligned with stripe size. This could potentially double IOs needed for small random reads (worst case scenario), and will impose overhead on any type of I/O.

Power Saving mode shouldn't cost you much. From what you tell us the disks are too busy to be spun down, and CPU is waiting for I/O anyhow.

Paweł Brodacki
  • 6,451
  • 19
  • 23
  • I wrote about the Linux kernel sync() bug [here](http://administratosphere.wordpress.com/2011/05/13/linux-kernel-sync-bug/). I'm using ext4 on the primary system with noatime; the comparison system uses xfs with noatime. The %util in iostat is right around 30%. By the way, both use LVM. – Mei Aug 19 '11 at 16:00
  • iostat -dx shows the disk frequently at 100% util (or high); avgrq-size is 15-25. Watching vmstat shows the 24 CPUs never being fully used. – Mei Aug 19 '11 at 16:16
  • iostat also shows await of about between 50-110 ms; svctm of 1.0. Another thing - the disk load has shifted to dominantly writes. I dropped the read-ahead to 128 to see if that helps. – Mei Aug 19 '11 at 16:21
  • http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync It's not a kernel bug. It's result of application developers getting used to ext3 writeback on small memory computers. Disk utilisation close to 100% means you are getting what the hardware can give you. Disk I/O isn't very hard on CPU, so I wouldn't expect you to be bottlenecked by CPU. Random writes by a database, which is careful to flush its buffers to disk is going to be hard on two 7.2k rpm disks. – Paweł Brodacki Aug 21 '11 at 10:57
  • It *is* a kernel bug. The [article](http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync) by Theodore T'so that you mention was written in March 2009; the [bug](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/624877) I mention in my [article](http://administratosphere.wordpress.com/2011/05/13/linux-kernel-sync-bug/) was reported in August of 2010 (including comments by Theodore as well). Thanks anyway. – Mei Aug 22 '11 at 02:24
0

Be careful using tools that show average CPU load. That number is certainly a good starting point to see a ball-park load but if you see 50% load on a 24 cpu system, how do you know 12 cpus aren't being 100% utilized and the other 12 idle? I've seen cases where the load is <10% yet 1 cpu is being hammered at 100% processing interrupts. -mark

Mark J Seger
  • 161
  • 1
  • 6
  • You don't rely on CPU load alone. In my case, I look at vmstat Run (`r`) and Blocked (`b`) columns: if there's 4 things running, there should be something like four CPUs being used. Even better: Linux top has the `1` key which splits CPU usage into all individual CPUs. – Mei Aug 22 '11 at 20:50
0

One of our servers had that RAID controller and firmware revision; apparently, the newest version of the firmware fixes a bug where the write-cache battery doesn't properly charge. Due to the battery not being charged, the controller switches to Write Through mode to protect your data, significantly impacting your performance.

Update the firmware and give it a few hours for the battery to charge. Then you'll be running normally.

Bigbio2002
  • 2,763
  • 11
  • 34
  • 51