4

I have a weird issue with a server I am setting up. It's for a filesharing-type website, so fast IO and plenty of capacity are the requirements. OS is CentOS 6.4 64-bit

The server in question is a HP DL360p, with 18x drive bays populated with 2TB SAS drives in RAID50

There's also a HP StorageWorks SAS expansion bay with a further 12x2TB's, also in RAID50

RAID was configured using the server's BIOS configuration utility, the controllers used are pretty good ones with battery backup and 2GB FWBC.

Now, originally we set these up as separate volumes, but due to the specifics of our software, it would work out much better to have a single, large volume.

So, I set up an LVM volume combining these two volumes, then formatted the logical volume using XFS

The problem is, the resulting speed is disappointing. Running hdparm -tT gives a best read speed of 300MB/s

So I did a few tests and got this:

no LVM, XFS on both: Both volumes get around 700MB/s read speeds

with LVM, but volume not mounted: 1000-1100MB/s

with LVM in striped mode, volume not mounted: 1100-1300MB/s

So somehow XFS seems to be restricting the speed... I tried some more advanced options when formatting and mounting, such as -l internal, enabling lazy-count, nobarrier, but this yielded no improvement.

The only thing I found that may be a problem, the strip size of the RAID volumes did not match (one was set at 512KB and the other as 256KB), so I am re-configuring them to match, which will take a few hours more. I also reformatted the volume with su=512k,sw=28 (sw=28 is just a guess, as there are 28 active HDDs altogether... or should I set this to 2 for the RAID volumes?)

I'm tempted to just wipe out the whole thing and try ZFS, it seems promising, but I think configuring it would be far beyond my skill level...

So, if anyone has any experience or advice on this, it would be much appreciated!

ewwhite
  • 194,921
  • 91
  • 434
  • 799
eTiMaGo
  • 43
  • 1
  • 5
  • `hdparm` doesn't test the filesystem but directly the underlying block device. If it reports slow values, then you have gotten something wrong with the LVM part. Debug this first and care about the filesystem later (for filesystem performance tests you'd use a tool such as `bonnie++`). – Oliver Jun 25 '13 at 12:00
  • Can you clear up a few things so I can give a better answer? An HP ProLiant [DL360p Gen8](http://h18004.www1.hp.com/products/quickspecs/productbulletin.html#spectype=worldwide&type=html&docid=14211) does not have 18 drive bays. What controllers are you using? What external enclosure are you using? – ewwhite Jun 25 '13 at 12:03
  • Oliver, OK, I will give that a test, but it's quite strange why hdparm would have such different results with the volume mounted or not? ewwhite, apologies, it's a ML360p, the DL360p is the 1U version! – eTiMaGo Jun 25 '13 at 12:14
  • @ewwhite, The controllers are the embedded P420 for the internal drives, and a PCIe P421 for the external bays (which is a HP StorageWorks but I don't have the product model on hand, it should be a D2600, connecting over dual SAS links) – eTiMaGo Jun 25 '13 at 12:22
  • @eTiMaGo Warmer, but the ML360p is not a valid HP product either. Are you trying to say ML350p Gen8 with 18 x 3.5" large-form-factor disks? – ewwhite Jun 25 '13 at 12:23
  • @ewwhite, sigh, I should look more carefully at the tags, yes it's a ML350p, specifically of product ID 652064-B21 :) – eTiMaGo Jun 25 '13 at 12:25
  • Also, after the arrays are rebuilt, I will try mounting with inode64 – eTiMaGo Jun 25 '13 at 12:49
  • Try centos 5 instead of 6, or ext4 instead of xfs. If the problem goes away, it's the same as my problem and I'm still debugging it with red hat and HP, they might welcome an extra data point. – Dennis Kaarsemaker Jun 25 '13 at 13:10
  • @DennisKaarsemaker Still? Even with the new kernels released since? – ewwhite Jun 25 '13 at 13:13
  • Yeah, still. Otherwise I would have closed the question :) – Dennis Kaarsemaker Jun 25 '13 at 13:14
  • @Dennis Kaarsemaker OK I considered ext4 but it seems getting it to work properly in 64-bit mode for volumes over 16TB is a bit iffy? Will give it a try anyway! – eTiMaGo Jun 25 '13 at 13:16
  • @eTiMaGo Drop the LVM. It's redundant on an HP Smart Array controller. I don't have these performance issues, even on large volumes. – ewwhite Jun 25 '13 at 13:21

1 Answers1

3

What are your application's read/write throughput and IOPS requirements? Storage performance is not always about the array throughput or raw bandwidth. Sequential reads/writes are only a faction of the I/O activity.

A more accurate test would be a bonnie++ or iozone run against a mounted filesystem... Or even running your application and metering the real workload.


If I were you, I'd dump the internal and external controllers and consolidate to an HP Smart Array P822 controller - Part #615418-B21.

This would allow you to address your internal disks and the external enclosure in one array. The P822 also has the Smart Array Advanced feature set (SAAP) enabled by default. At that point, you can carve the array up properly with the right RAID level (probably RAID 1+0) and the ability to assign one or more global hot-spares to the setup. The controller will also leverage the dual-paths to your external storage. You could also arrange things to stripe or mirror your drive pairs between internal and external storage. A lot of flexibility.

The biggest advantage of this setup, though, is the included HP SmartCache SSD caching feature. This is similar to LSI's Cachecade. Armed with an SSD or two, you can keep hot data on lower-latency SSDs versus having to go to spinning disk.

That's just the hardware side, though...


For XFS volumes and especially on HP gear, I don't use LVM (some people do, though). With HP controllers, the block device presentation is abstracted, so I use a pretty basic filesystem formatting string:

mkfs.xfs -f -d agcount=32 -l size=128m,version=2 /dev/sdX`

The fstab has a couple of mount options:

UUID=4a0e1757 /data   xfs     noatime,logbufs=8,logbsize=256k 1 2

But with RHEL6, there are also some scheduling and performance tuning features you should take into account. The main one is the tuned framework.

yum install tuned tuned-utils
tuned-adm profile enterprise-storage 

This will set the I/O elevator, disable write barriers and set a few other performance-minded options on-the-fly according to the schedule below.

enter image description here


ZFS on this setup won't help you. You'd have to dump the Smart Array controllers entirely and go to SAS HBAs or lose all ZFS RAID functionality and use one large block device (which would benefit from the Smart Array P822 I proposed above). Both would require some form of write cache (ZIL) and I don't think that it solves the manageability problem. ZFS requires way more planning up-front.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Yikes, thanks for the in-depth reply! We don't really have any specific requirements for throughput and IOPS, but obviously the more the better, 1GB/s is a good target which should support a good amount of users. At the moment I can't really do much testing, as the arrays are rebuilding with 512KB strip sizes. But I do have a couple of other servers running too, and testing bonnie++ with them. They don't have as much storage (DL360p's with 4x SSDs), though, but the performance is satisfactory. – eTiMaGo Jun 25 '13 at 13:31
  • As for the RAID controller, we do have the Smart Array Advanced license so we do have use of the dual link. but that SSD caching does sound sweet... I think I will just drop the LVM and stick with separate volumes, it won't be as convenient, but as long as it's faster, that's fine. I was planning to add additional storage by the way of NAS devices such as Synology Rackstations over 10GBe links, and add them to the LVM through iSCSI, but that's probably a bad idea? – eTiMaGo Jun 25 '13 at 13:35
  • The SAAP license you have includes the SSD caching, so you already have that (upgrade the controller firmware to be able to see it). I'm suggesting the P822 controller because it will be able to see ALL of your internal and external disks on one controller. Don't add iSCSI devices to this mix. The HP is more than capable of addressing a large number of disks. – ewwhite Jun 25 '13 at 13:41
  • OK, thanks very much for your HP insight (tee hee). For the time being I will go back to a non-LVM setup. – eTiMaGo Jun 25 '13 at 14:16
  • while we're still on the topic, my HW RAID arrays now have a strip size of 512KB and a full stripe size of 2560KB. Should I add su=512k to the xfs format line, or su=2560k, or not at all? – eTiMaGo Jun 26 '13 at 01:51
  • I don't bother. – ewwhite Jun 26 '13 at 03:20
  • This is fantastic precise and correct advice, though I disagree with the "one size fits all" `mkfs.xfs` command. Have a read of the XFS FAQ and learn how to create a filesystem correctly across RAID stripes, and only use as many Allocation Groups as you have physical devices. – suprjami Jul 27 '13 at 10:43
  • 1
    @suprjami The abstraction layer of the HP Smart Array controller is the reason I don't go by # of RAID devices. I often grow my physical arrays over time. I prefer to go with allocation groups as a function of # of CPU cores. Testing either way doesn't show much of a difference once above 8 AGs in my systems. – ewwhite Jul 27 '13 at 10:53
  • Fair enough, it's not something I have had a chance to play around with on large storage, I just know the theory. Each AG makes an IO thread, the idea being that each underlying physical device has one process doing IO. You are right about CPU, you should theoretically have core per AG, plus cores left over to service whatever's actually generating the IO. – suprjami Jul 27 '13 at 12:34