10

iSCSI Target

Ubuntu 14.04 (Trusty Tahr) with 16 GB RAM and 16 core CPU as LVM backed iSCSI target using three Samsung SSD disks, each capable of doing 65k IOPS using an LSI 6 Gbit/s controller with on board cache.

Benchmark on SSD disk in target:

fio --filename=/dev/sdd --direct=1 --sync=1 --rw=write --bs=4k --numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting --name=ssd-max

iops=65514

Where sdd is configured in hardware RAID 0 using three Samsung 850 EVO SSDs.

Initiator

I exported a 500G LUN on an Ubuntu 14.04 client with 32 GB RAM and 8 core CPUs.

Benchmark on exported LUN

fio --filename=/dev/sdg --direct=1 --sync=1 --rw=write --bs=4k --numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting --name=client-max

iops=2400

There is significant performance drop when doing DAS and over network, I was expecting at least 10k IOPS.

Communication between target and initiator is less than 1 ms and iperf shows a network throughput of 9.2 Gbit/s.

I understand that there will be a performance impact for 4k writes as each data has to go through the network stack of both the initiator and target before getting written to disk, but this is an unacceptable drop from 65k to 2k.

Where can the problem be? I have a 10 Gbit/s Ethernet NIC between the target and initiator. Any ideas?

Peter Mortensen
  • 2,319
  • 5
  • 23
  • 24
Kevin Parker
  • 757
  • 1
  • 13
  • 29
  • 2
    Not anywhere near enough information and our crystal balls are too expensive to waste them on non paying customers. If you want help, then provide meaningful information that can be used to help you nail things down. – TomTom Mar 22 '15 at 10:33
  • I have edited my question,if you have time you can help me with your suggestions. – Kevin Parker Mar 22 '15 at 10:51
  • Since the NIC and the CPU are likely to be the bottleneck in any software iSCSI setup, you might want to mention what they are. – rakslice Mar 23 '15 at 00:00

1 Answers1

20

Short answer: This is the results of network latency and a serial workload (as you imposed by using direct=1, sync=1 and iodepth=1).

Long answer: using direct=1, sync=1 and iodepth=1 you created a serial workload, as new writes can not be queued before the previous write was committed and confirmed. In other word, writes submission rate strictly depend on network latency. A simple ping between two machine can very well be in the excess of 0.2ms, more so when using a higher level protocol as TCP (and iSCSI on top of it). Presuming a total network latency of about 0.33ms, you have a maximum IOPS value of about 3000. This is without accounting for other latency sources (es: the disks themselves), so it is in-line with what you recorded.

Try this: execute a first benchmark without --direct=1 --sync=1, and another with these options in place but increasing the iodepth to 32 requests. Then report here the results.

shodanshok
  • 44,038
  • 6
  • 98
  • 162