6

I'm interested in hand-rolling a SAN solution on Linux with the following technologies:

  • iscsi
  • mdadm raid
  • lvm
  • multipath
  • xfs/gfs/???
  • teamed NICs

Hardware-wise, I'm thinking of 2 x gigE (or better) switches with multiple gigE NICs on both the targets and the initiators.

What recommendations do people have on how to configure this, ideally, on the presumption of full n+1 (min) redundancy?

Also, do I need a set of aggregator hosts in the middle of the iSCSI "fabric"? Something like this:

targets (with mdadm) <-gigE-> aggregator host (lvm) <-gigE-> initiators

or is it better to do something like this:

targets (no mirroring) <-gigE-> aggregator host (mdadm) <-gigE-> initiators (lvm)

There are many ways to design this and I'd be interested in what experience others may have had in doing something similar?

The SAN will be used for VMware images and generic file services (plus a few databases, if viable).

user9517
  • 114,104
  • 20
  • 206
  • 289
Brad
  • 279
  • 2
  • 10

1 Answers1

4

This is a very subjective question and is very dependent upon what you are trying to accomplish.

In addition you are asking some rather low level questions which leads me to believe you have not worked in this arena before, which is ok we all start at zero and level up from there.

Given those two observations, I would suggest you start small and work your way up.

[1] Start first with an iSCSI initiator(client) and an iSCSI target(server), both on different hosts. They can be a straight through cable or can go through a switch, at this point it does not matter. Play aorund with that for a while. Add more targets, create a mdadm raid out of your iSCSI targets on your clinet(don't worry if all of your iSCSI targets come off of one spindle for now). Then start playing around with LVM on your client. Create multiple PVs, add them to a VG and then create an LV. Expand your LV. Create a snapshot mirror of your LV.

Really dig into LVM, it will be the key to everything else you do.

Next add a second iSCSI target server. I would also suggest at this point getting a decent switch that supports LACP and some management. The Procurve 1800 series switches are a good bottom end, so are the Cisco SG-300's. Switch management and features will become more important later, but investing in them now is a good idea. At this point with two iSCSI target servers you will want to lather rinse repeat what you did when you had one server. If you really want to have some more fun, add a second initiator, and have it mount the same iSCSI targets. What happens when you try to have two systems write to the same EXT3 volume? Convert your volume to something like GFS (there's more involved but that is the objective at this point). Now what happens when two hosts write to the same volume?

Now let's add two nic's to all of our clients. Now you will need to learn about network bonding[2]. What are the different modes, how are they different. Get some bandwidth measurement tools running on each end of your connection, flood the link with the various bonding modes. What happens when you mix the modes so that each end is mismatched? What happens when you use a mode compatible with LACP and your switch is configured to do LACP[3]? Lather rinse repeat what you have done with regards to storage above.

Remove your interfaces from the bonded interface. Assign each interface on the target server a unique IP. Make sure your target server will share it's iSCSI targets through each interface. Now mount your iSCSI targets using the two different IP addresses. What happens when you write to each one? Now configure multipath [4], and play with that for a while. Down one of your target server's NIC's (remove the cable ifdown the device etc) what do you see in the logs? How does performance improve/decrease?

Now add a second network switch. You can either double your network interfaces or you can split them across the switches. What kind of bonding modes should you be using? Should you have a link between the two switches? Do you need to have STP enabled?

Now you're near the end of your learning and you have approached the level of master apprentice. You don't know it all, but you have a much better foundation of understanding than most people. From here you will have a better idea of how to architect a storage infrastructure. The technology will change if you use fiber channel or ATAoE, but the main concepts will be the same.

Useful web resources:
[1] http://www.cyberciti.biz/tips/rhel-centos-fedora-linux-iscsi-howto.html
[2] http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding
[3] http://en.wikipedia.org/wiki/Link_aggregation
[4] http://sources.redhat.com/lvm2/wiki/MultipathUsageGuide

I'm going to open this up for everyone to be able to edit.

Red Tux
  • 2,074
  • 13
  • 14
  • I get the point re GFS versus ext3, and I know LVM and mdadm reasonably well from non-iSCSI environments. It's the bonding, the LACP, multipath and those aspects that I'm a little confused on. Also, would it be wise to mdadm on the target and just present the /dev/mdN over iSCSI, or is it better to present the raw devices over iSCSI and then mdadm on the client? Clearly these alternatives will need some testing, but thoughts from previous experience would be helpful. – Brad May 08 '11 at 19:08
  • Cool. I also tried to answer for the next person to have an understanding of where to go in the learning process who may not know as much. Multipath means just that, multiple paths to the same storage device/spindle. You can accomplish this by having two SCSI controller paths to a DASD device, or via an iSCSI target with two IP addresses. Here's one resource http://sources.redhat.com/lvm2/wiki/MultipathUsageGuide – Red Tux May 08 '11 at 19:14
  • As for using RAID over remote iSCSI devices, that depends on the speed of the storage channel. If you have 8GB Fiber or 10GB Ethernet, that may very well be faster than your local SATA controller. Also that 8GB Fiber may be slower than 3GB SATA due to path and IO constraints. Each situation is different. Most often I see LVM used to grow volumes on the local system, then they let the EMC back end deal with RAID. Also in that case all hosts had at least two paths to every storage device, be it tape or spindle. – Red Tux May 08 '11 at 19:20