10

I have done some simple performance tests and it seems that reading from my RAID1 is slower than writing:

root@dss0:~# for i in 1 2 3; do dd if=/dev/zero of=/dev/sda bs=1048576 count=131072; done
137438953472 bytes (137 GB) copied, 192.349 s, 715 MB/s
137438953472 bytes (137 GB) copied, 192.851 s, 713 MB/s
137438953472 bytes (137 GB) copied, 193.026 s, 712 MB/s
root@dss0:~# for i in 1 2 3; do dd if=/dev/sda of=/dev/null bs=1048576 count=131072; done
137438953472 bytes (137 GB) copied, 257.201 s, 534 MB/s
137438953472 bytes (137 GB) copied, 255.522 s, 538 MB/s
137438953472 bytes (137 GB) copied, 259.945 s, 529 MB/s

I understand that dd is not a performance test tool, but this result is still a surprise.

System was built by the vendor and has a Supermicro main board with 16 GByte RAM. RAID controller is a MegaRAID 9271-8i with 1 GByte cache. There are 8 2 TByte SAS disks on a SAS-933EL1 backplane. I am unsure of the cabling, one connector of the controller goes to the SAS backplane, the other goes to two SATA disks which hold the OS.

The RAID1 was set up with this command:

root@dss0:~# /opt/MegaRAID/MegaCli/MegaCli64 -CfgLdAdd -r1 [8:0,8:1,8:2,8:3,8:4,8:5,8:6,8:7] WB NORA Direct -a0
Adapter 0: Created VD 0
Adapter 0: Configured the Adapter!!
Exit Code: 0x00

root@dss0:~# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -LALL -aALL
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 7.275 TB
Sector Size         : 512
Is VD emulated      : No
Mirror Data         : 7.275 TB
State               : Optimal
Strip Size          : 256 KB
Number Of Drives    : 8
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
PI type: No PI
Is VD Cached: No
Exit Code: 0x00

I would expect that read access is at least as fast as write access, maybe even faster. The 715 MByte/sec write speed seems to be near the 6 GBit limit of a single SAS/SATA connector. Is this maybe a configuration or cabling issue with the SAS backplane? Can the SAS backplane configuration be queried with a MegaRAID command? Please advise.

Update

As expained by poige and Peter, the slower-than-expected read performance is probably caused by caching of the Linux I/O-subsystem.

When using the direct flag in the dd command I get

root@dss0:~# dd if=/dev/sda of=/dev/null bs=1048576 count=131072 iflag=direct
137438953472 bytes (137 GB) copied, 199.862 s, 688 MB/s

which is much better but still 10% slower that the write speed. Using oflag=direct did not affect the write speed.

nn4l
  • 1,336
  • 5
  • 22
  • 40

2 Answers2

10

The key to the answer to your question is read-ahead. Once upon a time, I also happened to have that issue.

IOW, for optimal sequential read performance all disks should be permanently involved into Input.

When you use dd w/o directio (see man dd), write operation is not being performed immediately, but goes through OS cache, so it has more chances to involve all the disks sequentialy and achieve maximum possible performance.

poige
  • 9,171
  • 2
  • 24
  • 50
8

poige is exactly right about the write cache, but here are more details.

dd with zeros and using write cache is not the right way to benchmark (unless you want to test the write cache of course, which is probably only useful for a file system, to see how much it syncs metadata, creates new files, etc.) (and likely dd is always the wrong type of benchmark, but it works for a very basic test)

I suggest you use dd with at least one the following options:

conv=fdatasync -> this will make it flush to disk before finishing and calculating speed
oflag=direct   -> this will make it skip the OS cache but not the disk cache
conv=sync      -> more like skipping the disk cache too, but not really ... just flushing it every block or something like that.

And don't use zero either. Some smart hardware/software/firmware might use some shortcuts if the data is so predictable as zeros. This is especially true if there is compression which I am guessing you aren't using. Instead, use a random file in memory (such as /dev/shm). urandom is slow, so you need to write it somewhere temporarily to read it again. Create a 50MB random file:

dd if=/dev/urandom of=/dev/shm/randfile bs=1M count=50

Read the file many times to write it (here I use cat to read it 6 times):

dd if=<(cat /dev/shm/randfile{,,,,,}) of= ... conv=fdatasync

rm /dev/shm/randfile

Also keep in mind that raid1 reads are fastest with parallel operations, so the disks can be used independently. It's probably not smart enough to coordinate the disks to read different parts of the same operation with different disks.

Peter
  • 2,546
  • 1
  • 18
  • 25