How good is failover of the iSCSI target on a two-node linux san?

Question

I'm evaluating the possibility to use two off the shelf servers to build a cheap iSCSI redundant SAN. The idea is to run linux, pacemaker, and an iSCSI target - something like the SAN Active-Passive on linux-ha-examples.

The same page scares me a little when I read:

During the switchover of the iscsi-target one can detect a gap in the protocol of write-test.log. In our setup we observed a delay of 30s. There are problems reported in connection of ext3 and an iscsi failover This configuration has been tested with ext2 and ext3 and worked with both filesystems.

Has anyone put in production a redundant iSCSI SAN made out of linux boxes? Is a failover event really that bad? A 30 seconds freeze in I/O sounds like a disaster to me, isn't it?

No 30 seconds is not a disaster. Many midrange external (FC) disk arrays have similar I/O freeze in their worst-case failover scenario. Most applications, including databases, happily survive even longer freezes. Just tune the clients' timeouts, test, verify if SCSI commands unfreeze without being failed by client OS. — kubanczyk, Jan 18 '12 at 12:19
FYI: Commercial grade enterprise gear specifies (guaranteed) fail-overs in the order of 180 seconds. The default SCSI timeout for sg layer in the Linux kernel (note that this varies for most distros and what installed drivers are active etc - check in '/sys/block//device/timeout' for the current setting) is usually something between 30 and 60s. If you can't tolerate 30s of IO blocking you are probably on the wrong platform and/or approach. — pfo, Jan 18 '12 at 13:00

score 6 · Accepted Answer · answered Jan 18 '12 at 14:20

6

SCSI connections time out after 15 seconds (or something) by default. If your home-built solution can't complete a takeover during that time, you'll need to play with that value. Also worth considering is that normal SANs mirror their cache so after a takeover, writes that were acknowledged but not yet committed to disk are not lost. If you can't arrange for that, you risk data corruption or having to avoid caching writes.

answered Jan 18 '12 at 14:20

Basil

8,811
3
37
73

Good hint! It is quite often forgotten is that you need to disabled write caching on your RAID controller card since you can theoretically and practically lose it's whole content as the two boxes you use for fail-over don't have cache coherent synchronization. This is a huge performance impact. – pfo Jan 18 '12 at 14:25
Checking for in-flight writes was already on my list. The broad idea is to use conservative settings whenever possible, including the use of protocol A for drbd and turning off write caches on the underlying block storage. We're targeting a stable solution and we don't need super high performance, luckily. The SAN will run over a dedicated gigabit network with jumbo frames, and the two storage nodes will have separate 2x gigabit bonded crossover links with jumbo frames dedicated to DRBD. – Luke404 Jan 18 '12 at 16:20
3

You're spending a lot of config time (and hardware) on re-solving a very old problem. Why not just put in an HP lefthand VM or something? Spend the 10k now and save yourself the hundreds of hours of head-scratching a home-built solution will cause. – Basil Jan 19 '12 at 15:11

score 5 · Answer 2 · answered Jan 18 '12 at 12:48

5

We have set up two Linux boxes as iSCSI target cluster. We use DRBD and SCST target and it works fine. (SCST target is better than the old iscsitarget, VMware ESXi can kill that one but not SCST).

The timeout is a client side settings so you can set it lower if you wish.

answered Jan 18 '12 at 12:48

Stone

6,941
1
19
33

For your information, IET received many enhancements lately and now supports SCSCI-3 reservations too. I'd say IET or SCST are still the best iscsi target now from a stability and capability point of view. – wazoox Jan 18 '12 at 14:23
I'm targeting a recent Ubuntu system for the nodes, maybe the next LTS (12.04), and as far as I know the best upstream-included target is IET, so I was thinking about using that one... but I still need to do some more research on the matter... – Luke404 Jan 18 '12 at 16:16
IET sometimes dies under huge IO. – Stone Jan 18 '12 at 17:39

How good is failover of the iSCSI target on a two-node linux san?

2 Answers2