6

I have a server with a LSI MegaRAID SAS 9260-4i controller, RAID-5 with 3 x 2 TB disks. I did some performance testing (with iozone3) and the numbers show clearly that the write cache policy affects the read performance as well. If I set the policy to WriteBack I get about 2x the read performance in comparison with WriteThrough. How could the write cache affect the read performance?

Here are the details of the setup:

megacli -LDInfo -L0 -a0

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 3.637 TB
Is VD emulated      : Yes
Parity Size         : 1.818 TB
State               : Optimal
Strip Size          : 512 KB
Number Of Drives    : 3
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No

With WriteBack enabled (everything else is unchanged):

Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU

Some numbers from iozone3:

 WriteThrough:
                                                    random  random
      KB  reclen   write rewrite    read    reread    read   write
 2033120      64   91963   38146   144980   139122   11795   21564
 2033120     128   83039   90746   118660   118147   21193   33686
 2033120     256   78933   40359   113611   114327   31493   51838
 2033120     512   71133   39453   131113   143323   28712   60946
 2033120    1024   91233   76601   141257   142820   35869   45331
 2033120    2048   58507   48419   136078   135220   51200   54548
 2033120    4096   98426   70490   119342   134319   80883   57843
 2033120    8192   70302   63047   132495   144537  101882   57984
 2033120   16384   79594   29208   148972   135650  124207   79281

 WriteBack:
                                                    random  random
      KB  reclen   write rewrite    read    reread    read   write
 2033120      64  347208  302472   331824   302075   12923   31795
 2033120     128  354489  343420   292668   322294   24018   45813
 2033120     256  379546  343659   320315   302126   37747   71769
 2033120     512  381603  352871   280553   322664   33192  116522
 2033120    1024  374790  349123   289219   290284   43154  232669
 2033120    2048  364758  342957   297345   320794   73880  264555
 2033120    4096  368939  339926   303161   324334  128764  281280
 2033120    8192  374004  346851   303138   326100  186427  324315
 2033120   16384  379416  340577   284131   289762  254757  356530

Some details about the system:

  • Ubuntu 12.04
  • 64 bit
  • Kernel 3.2.0 (3.2.0-58-generic)
  • Memory was limited to 1 GB for the test
  • iozone3 version 397-2
  • Partition used for the test: /dev/sda4 /var ext4 rw,relatime,user_xattr,barrier=1,data=ordered 0 0
tlo
  • 528
  • 2
  • 8
  • 24
  • I would have expected otherwise - if write-cache is disabled there should be more read-cache available. – Nils Feb 10 '14 at 16:19

5 Answers5

3

By using a writeback cache, you are saving disk IOPS. The controller can batch up smaller writes into one big write.

Thus, there are more IOPS available for reads.

This assumes that the tests are concurrent. If a given test is only reads or only writes this won't matter.

Dan Pritts
  • 3,181
  • 25
  • 27
  • I would assume that iozone does not mix writes/reads when doing the read test. Also, a simple read test shows similar results, see also my comment to the answer of alxgomz. – tlo Feb 10 '14 at 17:13
  • I wouldn't assume anything...but your simple read test does seem like good evidence. – Dan Pritts Feb 11 '14 at 04:57
2

What filesystem is this test run on?

What comes to mind is atime. If your filesystem is mounted with atime option or missing no/relatime mount option you will get a write for every read.

(atime means recording last access time for files)

It might be helpful if you post the output of

mount

and specify on which device you did the tests.

Hanno S.
  • 166
  • 3
  • The test was run on /var: mount output: /dev/sda4 on /var type ext4 (rw) /proc/mounts output: /dev/sda4 /var ext4 rw,relatime,user_xattr,barrier=1,data=ordered 0 0 – tlo Feb 05 '14 at 14:33
  • ok, relatime should not affect the reads here. I would try with noatime, just to be sure. But I guess there is something else involved. – Hanno S. Feb 05 '14 at 14:37
  • Yes, I'll try noatime, but I think it's something else, too. Nevertheless a good idea to think about filesystem options, thanks! – tlo Feb 05 '14 at 15:16
  • @tlo try to `sync` before doing the read-tests, too. – Nils Feb 10 '14 at 16:18
2

write-back policy do have effects on red performances when using tests like iozone because those benchmark tools measure read performance by reading data they have written previously. hence when iozone starts read tests, some data still lies in the cache hence making the read throughput a lot higher. this is regardless of the size of files as the raid adapter has no knowledge of files, or even filesystems. All it knows are IOs and blocks. keep in mind iozonr I'd an fs benchmark tool and thug totally abstracts hardware. maybe using -J/-Y you could mitigate effects of write back policy and get an idea of your read performance... or use a true HDD bench tool (hdparm?)

alxgomz
  • 1,600
  • 1
  • 10
  • 14
  • 1
    I did a quick test with hdparm: Without write-back I get around 140 MB/s with write-back I get > 300 MB/s. A simple dd read-test (from existing data, so data that was not written within the last minutes/hours) gives similar results. – tlo Feb 10 '14 at 17:07
1

The most obvious reason is because the reads are coming from the cache, rather than the disk itself. Remember that with WriteBack, the written data is held in the cache, until the RAID controller gets a chance/decides to write it to the disk. However it makes sense that if the same data is read (or anything that it still holds in it cache) then it uses the cache to retrieve the data, rather than indulge in relatively expensive disk reads.

GeoSword
  • 1,647
  • 12
  • 16
  • 2
    I can't really believe this because the controller has 512 MB Cache and iozone is testing with 2 GB file size. Also, a simple dd test of partitions which were not touched in days show similar results. – tlo Feb 06 '14 at 15:15
  • Yes, I hadn't spotted that. You would expect that with a file size larger than the cache size, that it would effectively revert to WriteThrough mode – GeoSword Feb 06 '14 at 17:48
1

It is likely that as the file is being written it is also being written as on continuous block on the disk.

One thing the previous solutions don't take into account is the difference between how write-back and write-through commands are handled on the controller.

Write-Back means when the controller receives a command, it immediately tells the OS handler that it was "ok", and to give the next one. Write-Though waits for each individual command to report a success before processing the next query.

As a result commands are queued faster. Then the Read-Ahead setting of the Array will start populating the cache with a continuous stream of data.

You can see that Read-Ahead with the faster command queuing is helping quite a bit, if you look at the very small percentage differences between the Random Write and Random Read events which mostly remove the Read-Ahead boost especially at the smaller chunk sizes.

Another thing that can affect the performance is the slice size and block size for how many different heads are involved in each read or write operation.

Rowan Hawkins
  • 590
  • 2
  • 18