2

I have been attempting to measure and benchmark our LAN's throughput as part of a larger project. Our LAN is constructed using cat5e and HP ProCurve 1800-24G switches which support 10/100/1000 Mbps auto-sensing. The physical topology is rather simple. We have a ProCurve in our server rack that all the servers are connected to (I refer to it incorrectly as the "backbone" switch). Each of of our three switches that all of the client machines are connected to are then connected to the "backbone" switch using a separate cable/port for each switch. It's basically a "hub" design. The workstation I am testing from is two switches away from the "backbone" switch and has an old IDE drive in it. I used HDTune to measure my drive speed at approximatively 60 MB/s. Our servers are a HP DL380 G5 with a RAID6 array of 72GB single port 15K SAS drives and two Intel Xeon Duo Core CPUs at 3.0Ghz.

I have read a few of the other (here, and here) questions about this topic as well as the Tom's Hardware article so I am aware that my actual throughput will fall far short of the theoretical maximum bandwidth of a 1Gbit network (e.g., a 124 MB/s transfer rate). However, the real thing that is puzzling me is the discrepancies between numbers I am getting using netcat vs. timing a file transfer using CIFS/SMB.

I am using the cygwin version of netcat like so:

On the "server":

nc -vv -l -p 1234 > /dev/null

On the "client":

time yes|nc -vv -n 192.168.1.10 1234

For testing the file transfer rate using CIFS/SMB, I just do something like this using cygwin:

time cp local_file.iso /remote_dir/

Now unless I have done my math (divide bytes transfered by seconds, to get bytes per second and convert from there) completely wrong transferring with netcat is really really slow. Like 4-8 MB/s slow. On the other hand, using a 120MB .iso as a transfer file I calculate out the throughput to the CIFS/SMB server at around 30-35 MB/s. Still way slower that I would expect but a completely different number than what I get using netcat.

I should mention that I'm actually using two different servers to do the netcat vs. CIFS/SMB testing. I'm using a XenServer host for netcat, and a Windows 2003 Server for CIFS (I don't have the option to install netcat on Windows server). They are identical in terms of hardware. Now I know this might be a bit of an apple to oranges comparison but if I do a CIFS transfer to the XenServer host I get transfer rates right around 30 MB/s which seem to concur with the result I get to the Windows 2003 Server.

I guess I really have two questions: 1) Why the different numbers between netcat and timing the CIFS/SMB share file transfers? and 2) Why is my actual throughput so low? I know my disks can only push so much data to the NIC so fast, but surely I should see something around 60 MB/s?

  • Have you tried using iperf or ttcp to test throughput? Have you tried bonnie++ or some other tool to look at the speed of your storage, so you can be certain you can get 60MB/s – Zoredache Dec 10 '10 at 19:44
  • Verify NIC link speed and duplex on every interface between you and the server, please. – SpacemanSpiff Dec 10 '10 at 21:04
  • @Tom I traced the connection and confirmed that I am 1Gbit all the way from my workstation to the servers and back. Interestingly enough the physical interfaces on XenServer are listed by mii-tool as bing negotiated 100baseTx-FD while XenCenter reports them as being at 1000Mbps. Regardless, 1000Mbps is what the switch reports. –  Dec 11 '10 at 00:08

2 Answers2

1

see how fast is a read on the xen - use something like hdparam or just do a local copy, that will give an idea about disk performance

to transfer files try A different tool like dd is strange why nc is slow - is very fast for me for example

silviud
  • 2,677
  • 2
  • 16
  • 19
  • hdparm reports that I'm getting 176 MB/sec timed buffer reads, which isn't too bad. –  Dec 11 '10 at 00:04
  • hey - there is something strange - you say that you use -l -p but that doesn't work for me - the man pages says -p source_port Specifies the source port nc should use, subject to privilege restrictions and availability. It is an error to use this option in conjunction with the -l option. – silviud Dec 11 '10 at 03:30
  • also nc gives at the end a summary on the transfer - something like 367108824 bytes transferred in 5.120366 secs (71695816 bytes/sec) – silviud Dec 11 '10 at 03:32
  • I think something is wrong with the cygwin version of nc. There is nothing in -h that seems to indicate that the use of both -l and -p is bad. The man file for the Debian version also does not indicate that they are mutually exclusive, however the version of nc included on my Fedora Core 13 live cd certainly does. I've been expanding my testing to include more tools (iperf, NetCPS, and regular timed file transfers) but I'm still only seeing actual throughput around 30-40% of the theoretical maximum; however those numbers are more consistent. I need to do more testing... Thanks for your help. –  Dec 16 '10 at 01:21
1

I don't really have a solid answer here other than be really careful what tool you use for measuring actual throughput. I have used netcat (both the cygwin and GNU versions), timed CIFS/SMB transfers, timed SCP transfers, iperf and NetCPS on a multitude of hosts in both directions AND of course I have gotten wildly divergent answers. However, I found with careful and methodical use of the same tool I would get similar numbers across different hosts.

The other thing that is really worth mentioning is that lot of these tools only test network throughput. Their payload is actually sent from memory directly to the network (I believe both iperf and NetCPS do this). While this is useful for helping to test network performance and pinpoint infrastructure problems, it doesn't give a very good sense of what the performance will look like from your client end-nodes. In my case, I can get about 25-30 MB/sec from my workstation using something like a timed file transfer but iperf will report that I can get 45-50 MB/sec.

So that pretty much wraps up everything I know about the 1st half of my question... different tools will give you radically different results when measuring network bandwidth.

As for the 2nd part of my question I really have no idea. Transfer rates around 25-30 MB/sec seem horrible for end-nodes until you realize that they can only push data onto the network as fast as their drives and buses will go (and my workstation is sloowww). Do I wish I was able to use more than 25% of my theoretical bandwidth? Yes, but from what I have read my results are not entirely unusual. I found that the servers were of course much faster (60-70 MB/sec), which is right around their measured maximum disk speed (HDTune reports it around 1GB/sec... seems slow for single channel 15K RPM RAID-6 SAS drives?). The weird thing though was that some servers seemed to be much slower, and that there existed an asymmetric transfer rate depending on which machine acted as the client and which acted as the server. For example: I would get 45MB/sec in one direction and 12MB/sec in the other. I suspect the fact that these servers are on separate subnets might be to blame but I haven't confirmed that. I even went so far as to test the NIC, switch port and cable to try and find the failure. I think the failure is in network topography design and possibly the router but I can't really be sure since no benchmarking was done prior to my arrival here. Long story short... certain things are slower than the rest of the network by a significant amount. Strange indeed.

Physical Network Topography:

       ----- 146.63.205.65 ----|--------|
       |                       | "Main" |---- [Client Switch # 1]
 <--[NetWare 6.5]              | Switch |---- [Client Switch # 2]
       |                       |        |---- [Client Switch # 3]
       ----- 192.168.61.1 -----|--------| 
                                 |
                                 |
                             "Servers"