9

I am trying to tune my NAS, running openfiler, and wondering why I'm getting relatively poor read performance from 4 WD RE3 drives in RAID 5.

EDIT: Please note I am talking about the buffered disk read speed not cached speeds

EDIT: Changed formatting to make clear there are two sets of output.

When I run hdparm on the meta device I get the levels of performance I'd expect, drop to the volume and it's a third the speed !

Any one any idea why ? Is LVM that bad ?

Dean

Meta device /dev/md0 results

[root@nas2 etc]# hdparm -tT /dev/md0
/dev/md0:
 Timing cached reads:   4636 MB in  2.00 seconds = 2318.96 MB/sec
 Timing buffered disk reads:  524 MB in  3.01 seconds = 174.04 MB/sec

Vol group /dev/mapper/vg1-vol1 results

[root@nas2 etc]# hdparm -tT /dev/mapper/vg1-vol1
/dev/mapper/vg1-vol1:
 Timing cached reads:   4640 MB in  2.00 seconds = 2320.28 MB/sec
 Timing buffered disk reads:  200 MB in  3.01 seconds =  66.43 MB/sec

Edit: See section from the hdparm man page which suggest this is perfectly valid test for sequential read performance which is the issue I am trying to resolve.

 -t     Perform timings of device reads for benchmark and comparison purposes.  For meaningful results, this operation should be repeated 2-3 times on an otherwise
              inactive system (no other active processes) with at least a couple of megabytes of free memory.  This displays the speed  of  reading  through  the  buffer
              cache  to  the  disk  without  any  prior caching of data.  This measurement is an indication of how fast the drive can sustain sequential data reads under
              Linux, without any filesystem overhead.  To ensure accurate measurements, the buffer cache is flushed during the  processing  of  -t  using  the  BLKFLSBUF
              ioctl.   If  the  -T  flag  is also specified, then a correction factor based on the outcome of -T will be incorporated into the result reported for the -t
              operation.
Dean Smith
  • 1,230
  • 2
  • 11
  • 13

5 Answers5

11

The default readahead settings for LVM are really pessimistic. Try blockdev --setra 8192 /dev/vg1/vol1 and see what that bumps your LVM performance up to. You will always take a performance hit using LVM; we measure it on properly configured systems at about 10% of underlying block device performance.

womble
  • 95,029
  • 29
  • 173
  • 228
4

I don't have a good explanation, but I can confirm the results.

Testing of RAID (raid5, 4x1.5TB drives)

root@enterprise:# hdparm -tT /dev/md2
/dev/md2:
 Timing cached reads:   2130 MB in  2.00 seconds = 1065.81 MB/sec
 Timing buffered disk reads:  358 MB in  3.00 seconds = 119.15 MB/sec
root@enterprise:# hdparm -tT /dev/md2
/dev/md2:
 Timing cached reads:   2168 MB in  2.00 seconds = 1084.54 MB/sec
 Timing buffered disk reads:  358 MB in  3.01 seconds = 119.10 MB/sec

test of volume which is uses md2 as the physical device.

root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
 Timing cached reads:   2078 MB in  2.00 seconds = 1039.29 MB/sec
 Timing buffered disk reads:  176 MB in  3.03 seconds =  58.04 MB/sec
root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
 Timing cached reads:   2056 MB in  2.00 seconds = 1028.06 MB/sec
 Timing buffered disk reads:  154 MB in  3.03 seconds =  50.81 MB/sec

I made the change proposed by womble and saw results like this.

root@enterprise:# blockdev --setra 8192 /dev/mapper/vg2-data

root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
 Timing cached reads:   2106 MB in  2.00 seconds = 1053.82 MB/sec
 Timing buffered disk reads:  298 MB in  3.00 seconds =  99.26 MB/sec
root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
 Timing cached reads:   2044 MB in  2.00 seconds = 1022.25 MB/sec
 Timing buffered disk reads:  280 MB in  3.03 seconds =  92.45 MB/sec
Zoredache
  • 128,755
  • 40
  • 271
  • 413
3

Make sure that you compare apples to apples.

hdparm -t reads from the beginning of the device which is also the fastest part of your disk if you're giving it a whole disk (and it's spinning platters).

Make sure you compare it with a LV from the beginning of the disk.

To see the mapping use pvdisplay -m.

(okay, granted, the difference in numbers may be negligible. But at least think about it :)

MikeyB
  • 38,725
  • 10
  • 102
  • 186
  • Actually turns out it's not negligible. If I use the volume that starts at extent 0 performance is nigh on identical. This is part of the answer I'm sure. – Dean Smith Nov 02 '09 at 09:10
  • Actaully turns out that if the volume is mounted the performance is lower. If I unmount the volume performance matches that of the raw device. This still seems a bit odd however. – Dean Smith Nov 02 '09 at 09:20
0

The workload created by hdparm -T is not representative for almost any use case except streaming reads from a single large file. Also, if performance is a concern, don't use raid5.

Jan Jungnickel
  • 964
  • 6
  • 9
  • 3
    Correct it isn't representative of a real workload, I didn't suggest that it was. It is however useful for comparing read speeds of raw devices. The meta device and vol group volume should have comparable raw sequential read speeds and they haven't. That is the point of the question. – Dean Smith Oct 31 '09 at 17:46
0

You can figure out where hdparm is spending its time with blktrace (if it's in I/O) or oprofile (if it's on CPU). Knowing the LVM setup would also help (pvdisplay, vgdisplay, lvdisplay).