8

I have running system with low IO utilization:

  1. HP DL380G7 ( 24gb RAM )
  2. Smart Array p410i with 512mb battary backed write cache
  3. 6x SAS 10k rpm 146gb drives in RAID10
  4. Debian Squeze linux, ext4 + LVM, hpacucli installed

iostat (cciss/c0d1 = raid10 array, dm-7 = 60G lvm partition for test):

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
cciss/c0d0        0,00   101,20    0,00    6,20     0,00     0,42   138,58     0,00    0,00   0,00   0,00
cciss/c0d1        0,00   395,20    3,20  130,20     0,18     2,05    34,29     0,04    0,26   0,16   2,08
dm-0              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
dm-2              0,00     0,00    3,20  391,00     0,18     1,53     8,87     0,04    0,11   0,05   1,84
dm-3              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
dm-4              0,00     0,00    0,00  106,80     0,00     0,42     8,00     0,00    0,00   0,00   0,00
dm-5              0,00     0,00    0,00    0,60     0,00     0,00     8,00     0,00    0,00   0,00   0,00
dm-6              0,00     0,00    0,00    2,80     0,00     0,01     8,00     0,00    0,00   0,00   0,00
dm-1              0,00     0,00    0,00  132,00     0,00     0,52     8,00     0,00    0,02   0,01   0,16
dm-7              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
dm-8              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

hpacucli "ctrl all show config"

Smart Array P410i in Slot 0 (Embedded)    (sn: 5001438011FF14E0)

   array A (SAS, Unused Space: 0 MB)


      logicaldrive 1 (136.7 GB, RAID 1, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)

   array B (SAS, Unused Space: 0 MB)


      logicaldrive 2 (410.1 GB, RAID 1+0, OK)

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 5001438011FF14EF)

hpacucli "ctrl all show status"

Smart Array P410i in Slot 0 (Embedded)
   Controller Status: OK
   Cache Status: OK
   Battery/Capacitor Status: OK

Sysbench command

sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=128 --file-block-size=4K --file-total-size=54G --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off run --max-requests=30000

Sysbench results

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.


Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed:  30000 Read, 0 Write, 0 Other = 30000 Total
Read 117.19Mb  Written 0b  Total transferred 117.19Mb  (935.71Kb/sec)
  233.93 Requests/sec executed

Test execution summary:
    total time:                          128.2455s
    total number of events:              30000
    total time taken by event execution: 2051.5525
    per-request statistics:
         min:                                  0.00ms
         avg:                                 68.39ms
         max:                               2010.15ms
         approx.  95 percentile:             660.40ms

Threads fairness:
    events (avg/stddev):           1875.0000/111.75
    execution time (avg/stddev):   128.2220/0.02

iostat during test

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,00    0,01    0,10   31,03    0,00   68,86

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
cciss/c0d0        0,00     0,10    0,00    0,60     0,00     0,00     9,33     0,00    0,00   0,00   0,00
cciss/c0d1        0,00    46,30  208,50    1,30     0,82     0,10     8,99    29,03  119,75   4,77 100,00
dm-0              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
dm-2              0,00     0,00    0,00   51,60     0,00     0,20     8,00    49,72  877,26  19,38 100,00
dm-3              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
dm-4              0,00     0,00    0,00    0,70     0,00     0,00     8,00     0,00    0,00   0,00   0,00
dm-5              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00
dm-6              0,00     0,00    0,00    0,00     0,00     0,00     0,00     7,00    0,00   0,00 100,00
dm-1              0,00     0,00    0,00    0,00     0,00     0,00     0,00     7,00    0,00   0,00 100,00
dm-7              0,00     0,00  208,50    0,00     0,82     0,00     8,04    25,00   75,29   4,80 100,00
dm-8              0,00     0,00    0,00    0,00     0,00     0,00     0,00     0,00    0,00   0,00   0,00

Bonnie++ v1.96

cmd: /usr/sbin/bonnie++ -c 16 -n 0

Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency  16     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
seo-db       48304M   819  99 188274  17 98395   8  2652  78 201280   8 265.2   1
Latency             14899us     726ms   15194ms     100ms     122ms     665ms

1.96,1.96,seo-db,16,1337541936,48304M,,819,99,188274,17,98395,8,2652,78,201280,8,265.2,1,,,,,,,,,,,,,,,,,,14899us,726ms,15194ms,100ms,122ms,665ms,,,,,,

Questions

So, sysbench showed 234 random reads per second.
I expect it to be at least 400.
What can be the bottleneck ? LVM ?
Another system with mdadm raid1 + 2x 7200rpm drives shows over 200 random reads per second...

Thanks for any help!

tombull89
  • 2,958
  • 8
  • 39
  • 52
Oleg Golovanov
  • 193
  • 3
  • 6
  • What is the stripe size? iostat looks as if sysbench was working on just one physical drive. – Dmitri Chubarov May 19 '12 at 16:52
  • hpacucli says, that strip size is 256k. DM-x devices from iostat are not physical drives, but lvm partitions. DM-7 is 60gb lvm partition, where i ran sysbench. – Oleg Golovanov May 19 '12 at 17:14
  • What exactly are you testing with this particular `sysbench` command line? Are you simulating a real-world usage scenario? – ewwhite May 20 '12 at 13:58
  • I am simulating PostgreSql database, that internally uses 4kb blocks. My app makes a lot of random reads/writes on big data files ( app stopped at the time of test ) – Oleg Golovanov May 20 '12 at 17:08

1 Answers1

10

Your system is definitely underperforming based on your hardware specifications. I loaded the sysbench utility on a couple of idle HP ProLiant DL380 G6/G7 servers running CentOS 5/6 to check their performance. These are normal fixed partitions instead of LVM. (I don't typically use LVM, because of the flexibility offered by HP Smart Array controllers)

The DL380 G6 has a 6-disk RAID 1+0 array on a Smart Array P410 controller with 512MB of battery-backed cache. The DL380 G7 has a 2-disk enterprise SLC SSD array. The filesystems are XFS. I used the same sysbench command line as you did:

sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=128 --file-block-size=4K --file-total-size=54G --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off --max-requests=30000 run

My results were 1595 random reads-per-second across 6-disks.
On SSD, the result was 39047 random reads-per-second. Full results are at the end of this post...

  • As for your setup, the first thing that jumps out at me is the size of your test partition. You're nearly filling the 60GB partition with 54GB of test files. I'm not sure if ext4 has an issue performing at 90+%, but that's the quickest thing for you to modify and retest. (or use a smaller set of test data)

  • Even with LVM, there are some tuning options available on this controller/disk setup. Checking the read-ahead and changing the I/O scheduler setting from the default cfq to deadline or noop is helpful. Please see the question and answers at: Linux - real-world hardware RAID controller tuning (scsi and cciss)

  • What is your RAID controller cache ratio? I typically use a 75%/25% write/read balance. This should be a quick test. The 6-disk array completed in 18 seconds. Yours took over 2 minutes.

  • Can you run a bonnie++ or iozone test on the partition/array in question? It would be helpful to see if there are any other bottlenecks on the system. I wasn't familiar with sysbench, but I think these other tools will give you a better overview of the system's capabilities.

  • Filesystem mount options may make a small difference, but I think the problem could be deeper than that...

hpacucli output...

Smart Array P410i in Slot 0 (Embedded)    (sn: 50123456789ABCDE)

   array A (SAS, Unused Space: 0 MB)

      logicaldrive 1 (838.1 GB, RAID 1+0, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 300 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50123456789ABCED)

sysbench DL380 G6 6-disk results...

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.

Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed:  30001 Read, 0 Write, 0 Other = 30001 Total
Read 117.19Mb  Written 0b  Total transferred 117.19Mb  (6.2292Mb/sec)
 1594.67 Requests/sec executed

Test execution summary:
    total time:                          18.8133s
    total number of events:              30001
    total time taken by event execution: 300.7545
    per-request statistics:
         min:                                  0.00ms
         avg:                                 10.02ms
         max:                                277.41ms
         approx.  95 percentile:              25.58ms

Threads fairness:
    events (avg/stddev):           1875.0625/41.46
    execution time (avg/stddev):   18.7972/0.01

sysbench DL380 G7 SSD results...

sysbench 0.4.12:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.


Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed:  30038 Read, 0 Write, 0 Other = 30038 Total
Read 117.34Mb  Written 0b  Total transferred 117.34Mb  (152.53Mb/sec)
39046.89 Requests/sec executed

Test execution summary:
    total time:                          0.7693s
    total number of events:              30038
    total time taken by event execution: 12.2631
    per-request statistics:
         min:                                  0.00ms
         avg:                                  0.41ms
         max:                                  1.89ms
         approx.  95 percentile:               0.57ms

Threads fairness:
    events (avg/stddev):           1877.3750/15.59
    execution time (avg/stddev):   0.7664/0.00
ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Thanks for detailed answer! Your results are awesome ... >> I don't typically use LVM, especially with the flexibility offered by the HP RAID controllers What kind of flexibility do you mean? >> What is your RAID controller cache ratio? 75/25, default) >> I think the problem is deeper than that... yup! I appended bonnie++ v1.96 results at the end of my main post. It seems results are not so good :( At first i'll try to get away from LVM and run the test again. If it won't help - i suppose, there is something wrong with raid controller. – Oleg Golovanov May 20 '12 at 17:01
  • Can you show your `bonnie++` command line? – ewwhite May 20 '12 at 17:05
  • Sorry, my bad. Command line: /usr/sbin/bonnie++ -c 16 -n 0 – Oleg Golovanov May 20 '12 at 17:09
  • 3
    I changed scheduler from cfq to noop, and benchmark results dramatically increased! )) Now sysbench shows for me 1500+ random reads / second ... Big, big thanks :) – Oleg Golovanov May 20 '12 at 19:02
  • 1
    Try `deadline`, too... Under DB loads, it may run better than `noop`, plus there are some additional tunables in your `/sys/block/cciss1/queue/iosched/` if you use `deadline`. [See this post](http://serverfault.com/questions/373563/linux-real-world-hardware-raid-controller-tuning-scsi-and-cciss/373975#373975) for more detail on finer tuning. I was able to get that test up to 2600 random reads/second by doing so. – ewwhite May 20 '12 at 20:54