3

I'm experiencing some significant performance issues on an NFS server. I've been reading up a bit on partition alignment, and I think I have my partitions mis-aligned. I can't find anything that tells me how to actually quantify the effects of mis-aligned partitions. Some of the general information I found suggests the performance penalty can be quite high (upwards of 60%) and others say it's negligible. What I want to do is determine if partition alignment is a factor in this server's performance problems or not; and if so, to what degree?

So I'll put my info out here, and hopefully the community can confirm if my partitions are indeed mis-aligned, and if so, help me put a number to what the performance cost is.

Server is a Dell R510 with dual E5620 CPUs and 8 GB RAM. There are eight 15k 2.5” 600 GB drives (Seagate ST3600057SS) configured in hardware RAID-6 with a single hot spare. RAID controller is a Dell PERC H700 w/512MB cache (Linux sees this as a LSI MegaSAS 9260). OS is CentOS 5.6, home directory partition is ext3, with options “rw,data=journal,usrquota”.

I have the HW RAID configured to present two virtual disks to the OS: /dev/sda for the OS (boot, root and swap partitions), and /dev/sdb for a big NFS share:

[root@lnxutil1 ~]# parted -s /dev/sda unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sda: 134217599s
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start    End         Size        Type     File system  Flags
 1      63s      465884s     465822s     primary  ext2         boot
 2      465885s  134207009s  133741125s  primary               lvm

[root@lnxutil1 ~]# parted -s /dev/sdb unit s print

Model: DELL PERC H700 (scsi)
Disk /dev/sdb: 5720768639s
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End          Size         File system  Name  Flags
 1      34s    5720768606s  5720768573s                     lvm

Edit 1 Using the cfq IO scheduler (default for CentOS 5.6):

# cat /sys/block/sd{a,b}/queue/scheduler
noop anticipatory deadline [cfq]
noop anticipatory deadline [cfq]

Chunk size is the same as strip size, right? If so, then 64kB:

# /opt/MegaCli -LDInfo -Lall -aALL -NoLog
Adapter #0

Number of Virtual Disks: 2
Virtual Disk: 0 (target id: 0)
Name:os
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:65535MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:7
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Number of Spans: 1
Span: 0 - Number of PDs: 7

... physical disk info removed for brevity ...

Virtual Disk: 1 (target id: 1)
Name:share
RAID Level: Primary-6, Secondary-0, RAID Level Qualifier-3
Size:2793344MB
State: Optimal
Stripe Size: 64kB
Number Of Drives:7
Span Depth:1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAdaptive, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Number of Spans: 1
Span: 0 - Number of PDs: 7

If it's not obvious, virtual disk 0 corresponds to /dev/sda, for the OS; virtual disk 1 is /dev/sdb (the exported home directory tree).

Matt
  • 1,037
  • 2
  • 14
  • 20
  • First, run `cat /sys/block/sdb/queue/scheduler` and report the result. Second, please describe "performance issues". – ewwhite Dec 11 '12 at 00:28
  • The `ST3600057SS` drive is not a 4k drive, the block size is 512 bytes for that drive. The bigger question is question is what have you set as the chunk size of your RAID6, and are you properly aligned to that. I would guess not given the starting sectors you listed in your partition table. – Zoredache Dec 11 '12 at 00:49
  • The performance problems I don't think are relevant to this question. My question is **not** "please help me solve my performance problems", but rather, "please help me better understand the effects of partition mis-alignment so I can decide for myself if it's related to my issues." – Matt Dec 11 '12 at 02:21
  • Consider using the `deadline` I/O scheduler. `CFQ` is the pits in EL5. – ewwhite Dec 11 '12 at 02:25
  • @matt As for the performance issues, alignment isn't at the core of it. You're abstracted through the RAID controller; writes should be cached via the battery-backed cache unit, reads are going to be what they are. – ewwhite Dec 11 '12 at 02:53
  • @ewwhite Do you happen to have a link that talks about why the deadline IO scheduler is better than CFQ on EL5? Not doubting you, just want to better understand how it might help. I'll start another question dealing with my performance issue specifics. – Matt Dec 11 '12 at 03:28
  • See: http://serverfault.com/questions/373563/linux-real-world-hardware-raid-controller-tuning-scsi-and-cciss – ewwhite Dec 11 '12 at 03:50
  • See this article. http://technet.microsoft.com/en-us/library/dd758814(SQL.100).aspx for some details about the impact of miss-alignment. It is written from a Windows perspective, but the OS really shouldn't matter. – Zoredache Dec 15 '12 at 01:38

1 Answers1

2

Your partitions are misaligned and it might be difficult to asses how much you're actually losing in performance because that would depend on the type of I/O workload. It might be negligible if your I/O workload is light, compared to the performance of your disks. However, since this is a NFS server, I'm assuming it's not negligible and should be addressed. Some estimates put the performance penalty at 20-30%.

Partition misalignment basically means you might need two I/O operations at hardware level to satisfy one I/O operation at software level. If your software blocks are not ending at the same hardware boundary, that will happen. From what you've written, you already researched and understand this.

  • Disk = 512-byte sectors
  • RAID = 65536-byte stripes (OK)
  • Partition 0 = starting at sector 63 (32256-byte offset)
  • Partition 1 = starting at sector 465885 (238533120-byte offset)
  • EXT2/3/4 block size = ?
  • EXT2/3/4 stride size = ? (calculator)

Keep in mind that partition misalignment is different from having your storage subsystem using block sizes that are different from what you software is using. This could also put more stress on your disks but is not really related to alignment issues.

Use tunefs -l /dev/sdaX | grep size to check the block size on your filesystem.

According to Red Hat Enterprise Linux's recommendations:

Generally, it is a good idea to align the partitions to a multiple of one MB (1024x1024 bytes). To achieve alignment, start sectors of partitions should always be a multiple of 2048, and end sectors should always be a multiple of 2048, minus 1. Note that the first partition can not start at sector 0, use sector 2048 instead.

It looks like you might be looking at having to move the data and recreate your partitions, if that is indeed the root cause of your NFS performance issue. However, the whole thing is usually more complex and I'd try to find evidence that other things are OK before considering a costly reinstall.

Giovanni Tirloni
  • 5,693
  • 3
  • 24
  • 49
  • Wow, thank you for the detailed response. Ultimately, I did a full re-install, also upgrading to CentOS 6.x. There may be more details, but, given this all happened two years ago, I can't remember them. :) But, since the OS upgrade, things have been acceptable. Thanks again! – Matt Aug 15 '14 at 13:39
  • Glad everything is alright. Geez, only now I saw the question's date. LOL. – Giovanni Tirloni Aug 15 '14 at 13:44