1

We are currently evaluating hardware and topology solutions for a new environment using GFS+iSCSI and would like some suggestions/tips. We have deployed a similar solution in the past where all hosts accessing the GFS nodes were the GFS nodes themselves. The new topology would be to separate the GFS nodes from the clients accessing them.

A basic diagram would look like:

GFS_client <-> gigE <-> GFS nodes <-> gigE <-> iSCSI SAN device

  1. Is this the most optimal way to setup GFS+iSCSI?
  2. Do you have suggestions on hardware for the GFS nodes themselves(ie - CPU or memory heavy)?
  3. Do you have suggestions on tweaks/config settings to increase performance of the GFS nodes?
  4. Currently we are using 3-4 gigE connections per host for performance and redundancy. At this point does 10gigE or fiber become more attractive for cost/scaling?
pjz
  • 10,497
  • 1
  • 31
  • 40
Josh
  • 118
  • 9
  • i'd ask more about the switches. i've had bad experiences with several brands, even with just a few 1Gb hosts. it seems that heavy non-IP (AoE) traffic makes them unstable; but iSCSI saturates them so badly that i get everything almost frozen. – Javier Jun 01 '09 at 23:00
  • In our last build that I described, we used Dell switches; bad idea as the limit on link aggregate groups is only 12 for an entire stack. We will likely be switching to Juniper, unless there is a better option I have yet to be presented with(Cisco is a little to pricey). – Josh Jun 02 '09 at 15:12

6 Answers6

1

The only part of this question I can sugget an answer to is #4.

We evaluated and considered 10GbE for our SAN, and decided it was cheaper, more effective, and safer, to stick with teamed/load balanced 1Gb apaptors. To achieve the same level of redundancy with 10GigE was astronomical, and provided nominal performance increase for clients (you're not going to put a 10GbE card in each client, after all).

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
1
  1. I don't think there's an "optimal" setup. Just make dead sure you start your iSCSI initiator before GFS. You've already specified bonding as a redundancy/performance measure. You should probably also think of setting up a multi-path connection to your target, if you have 4 NICs, maybe create 2 paths over 2 bonded interfaces for better redundancy. You should also consider using Jumbo frames if you have a dedicated iSCSI switch which supports that feature.

  2. GFS as a subsystem isn't very heavy on the system. There are locks held in kernel, some membership information/heartbeat running around between nodes, that's pretty much it. On the other hand, since you plan to make them both GFS nodes and a server being accessed by clients, you should probably invest in your nics/switches and RAM for the servers.

  3. Jumbo frames. 803.2ad link aggregation if possible, on both sides (iscsi and clients). tcp stack optimizations (/proc/sys/net/ipv4/tcp_rmem|wmem)

  4. I'll skip this one, I've no idea of the costs of 10ge.

katriel
  • 4,407
  • 22
  • 20
  • 1: We do have redundancy in the form of multipathing, seem my comment to randomwalter. We also have jumbo frames enabled on the SAN/switch/host. 2: This was the kind of info I was looking for. It seems like I should go mem and IO heavy on the GFS nodes, but not worry too much about CPU performance. 3: We do have jumbo frames enabled. We were going to enabled LAG, but it seems these switches only support 12 LAGs for an entire stack. This wasn't an advertised feature. On the TCP stack optimizations, I've never messed with that. Do you have any examples or links? – Josh Jun 10 '09 at 16:24
  • 1
    This is a common thing to do with oracle systems, hence the link: http://www.dba-oracle.com/t_linux_networking_kernel_parameters.htm. Another thing that occurred to me is to set the io scheduler to noop, since the NAS will be doing all the real io operations/scheduling for you, take a look here: http://www.linux-archive.org/centos/199396-correct-way-change-i-o-scheduler-iscsi-dev.html – katriel Jun 10 '09 at 18:37
0

Have you thought about network redundancy? GFS clusters are very vulnerable to missed heartbeats. We use interface bonding for all our cluster and iSCSI links, connected to separate switches.

  • We do have redundancy, although not in the form of bonding. Currently all iSCSI connected hosts have 3 separate gigE paths to 3 separate switches. The SAN also has 3 separate gigE paths that connect to the same 3 switches. So we should have n+2 redundancy. n+1 would be acceptable if we went with 10gigE. – Josh Jun 10 '09 at 16:17
0

Just to add to #3&4

Jumbo's can make a huge beneficial difference in performance especially for "storage" networks where 99.99% of packets will be large. Just make sure to do an audit first to ensure all hosts on the net support them.

Second, it's worth verifying that all those extra GigE interfaces are giving you more speed, most switches (by default) actually use MAC or IP based hashes so you may not actually see more then 1Gb between a single host pair.

By the time you're putting in 10Gbe you should just bite the bullet and use FC which is much faster for the same link rate, or, wait until early next year where the converged ethernet stuff should finally be shipping at below "early adopter" pricing.

LapTop006
  • 6,466
  • 19
  • 26
0

We are evaluating solution for our new SAN, and Equalogic product looks really great for iscsi. Each bundle is 15 disks and 2 controllers (A/P 4GB each). As you add 2 controllers per 15 disks, you have a linear increase in performance while adding storage capacity.

They don't go 10Ge for now, but each controller have 4 Links. They provides real thin provisioning

Link to the official page

Mathieu Chateau
  • 3,175
  • 15
  • 10
0

I can't comment (yet) on LapTop006's post, but he's absolutely spot on!

The pinch is that all your network equipment in the IP-SAN must support the same amount of MTU (Maximum Transmission Unit). If I remember correctly the maxmimum MTU for Jumbo Frames by spec is 9000 bytes, but I have seen people using 9100 and above..

pauska
  • 19,532
  • 4
  • 55
  • 75