-1

I'm not sure if this is the correct forum to ask, but here goes...

My employer decided to install a SAN. For various irrelevant reasons, this never actually happened. But I did some research to find out what a "SAN" is, and... I'm baffled. It looks like such an absurdly bad idea that I can only conclude that I've misunderstood how it works.

As best as I can tell, SAN stands for Storage Area Network, and it basically means that you connect your servers to your disks using an ordinary IP network. I am utterly stunned that any sane person would think this is a good idea.

So I have my server connected to its disks with an Ultra-320 SCSI link. That's 320 MB/s of bandwidth shared between all the disks on that server. And then I rip them off the SCSI link and plug them into a 100 Mbit/s Ethernet network with its piffling 12.5 MB/s of theoretical bandwidth. That's before you take into account any routing delays, IP overhead, and perhaps packet collisions. (The latter can usually be avoided.)

320 MB/s verses 12.5 MB/s. That's, let me see, roughly 25x slower. On paper. Before we add IP overhead. (Presumably SCSI has its own command overheads, but I'm guessing a SAN probably just tunnels SCSI commands over IP, rather than implementing a completely new disk protocol over IP.)

Now, with each server having a dedicated SCSI link, that means every time I add another server, I'm adding more bandwidth (and more disks). But with a shared SAN, every time I add a server I'm taking bandwidth away from the existing servers. The thing now gets slower as I add more hardware, not faster.

Additionally, SAN technology is apparently extremely expensive. And it seems reasonable to presume that setting up an entire IP network is vastly more complicated than just plugging a few drives into a cable.

So, these are the drawbacks of using a SAN - massively increased cost, massively decreased performance, loss of scaling, increased complexity, and more possible points for the system to fail at. So what are the advantages?

The one I keep hearing is that it makes it easier to add disks, or to move a disk from one server to another. Which sounds logical enough - presumably with a SAN you just gotta push a few buttons and the drive now belongs to a different server. That's a heck of a lot simpler than physically moving the drive (depending on exactly how your drives are connected).

On the other hand, in 10 years of working here, I have needed to change disks... let me count... twice. So it's an event that happens roughly once every 5 years. So you're telling me that once every 5 years, the SAN is going to save me 5 minutes of work? And every second of every day it's going to make stuff 25x slower? And this is a good tradeoff?

I guess if I was in charge of some huge datacenter with thousands of servers, keeping track of that much disk might be difficult, and having a SAN might make sense. Heck, if the servers are all virtualised, they'll all be as slow as hell anyway, so maybe the SAN won't even matter.

However, this does not match my situation at all. I have two servers with three disks each. It's not as if managing all that stuff is "difficult".

In short, no matter which way I look at this, it looks extremely stupid. In fact, it looks so obviously stupid that nobody would spend time on R&D making something so stupid. As I said above, this can only mean that I'm misunderstanding something somewhere - because nobody would do something this dumb.

Can anyone explain to me what I'm not seeing here?

  • 12
    Ever heard of 1Gbps Ethernet? 10Gbps? FibreChannel? Clustering? You'd never use an iSCSI SAN with a 100 meg network. Ever. – Chris McKeown Sep 07 '12 at 12:23
  • 3
    It's, seemingly, an honest question, but you seem like you've been burned about something? Was the money that was spent on the SAN supposed to pay for a company trip somewhere? – tombull89 Sep 07 '12 at 12:39
  • 1
    Well, at first glance and for people not so used to enterprise-grade equipment, SANs can very easily look *absurdly* expensive. It takes some time to really understand their cost/benefit ratio. – Massimo Sep 07 '12 at 12:41
  • The questions is fine to me. Many junior sysadmins have problems relating to SANs, and this could become the canonical question to refer them to. FWIW, I suggest not closing it. – Massimo Sep 07 '12 at 12:45
  • 3
    This has to be an elaborate trolling with the way that this is worded. – MDMarra Sep 07 '12 at 12:45
  • 2
    @Massimo There have been far better candidates for a canonical SAN question that have been closed as NC/NARQ/OT, and I don't see any reason why this question should be shoved over that bar. I think there would be some value to having such a canonical question, but basing a canonical question on *this*... rant... is not a good idea. – HopelessN00b Sep 07 '12 at 12:52
  • Agreed. Anyway, having a canonical SAN question would have been helpful here, to close this as a duplicate. Maybe we really should create one. It *is* a common problem between junior admins (and even some senior ones...) to not properly understand SANs. – Massimo Sep 07 '12 at 12:55
  • SCSI? Those servers must be from the garbage bin. It is ages i haveseen a SCSI disc - today it is all SATA or SAS. Is this question like 10 years old? – TomTom Sep 07 '12 at 13:07
  • 1
    Joke questions are not permitted on the SF network. – MikeyB Sep 07 '12 at 13:21
  • I gave it a try: http://serverfault.com/q/425335/6352. – Massimo Sep 07 '12 at 15:23
  • 1
    It's a pity everybody seems to think this was an elaborate troll. But still, I learned something useful today. – MathematicalOrchid Sep 07 '12 at 17:43
  • @MathematicalOrchid When you come to a site of professionals and say that you don't understand a widely implemented technology and ask for clarification, that's one thing. When you actively make seemingly uninformed accusations and make comparisons to outmoded tech (like Ultra320) then one of two things is possible: You're clueless, or you're a troll. We gave you the benefit of the doubt by calling you a troll :) – MDMarra Sep 07 '12 at 20:04
  • OK, well whatever. I got the information I came for... – MathematicalOrchid Sep 07 '12 at 21:38

5 Answers5

12

What you are not seeing here is, mainly, that while nobody with a sane mind would use 100 Mbit Ethernet to handle SAN traffic, there are lots of other options which are not only faster, but also a lot faster than direct-attached storage (like standard SCSI).

Enterprise-grade SANs are usually connected to servers using Fibre Channel technology, whose speed range from 1 Gbit to 20 Gbit, with the most commonly used adapters being in the 4-8 Gbit range; this not only achieves some impressive bandwidth, but can also make use of Multipath I/O, allowing for bandwidth aggregation and failover between different adapters, FC switches and storage controllers, for maximum availability.

Another way servers can access SAN storage is using iSCSI, which is an implementation of the SCSI protocol over IP transport; this is probably what you were referring to in your question. This is actually often considered a less-than-optimal solution (the optimal one being FC), but still nobody would run it over 100 Mbit Ethernet, and things change dramatically when running it over a gigabit (or 10 Gbit) Ethernet link.

That said about speed, everything else you heard about SAN applies: you can create and resize volumes (called LUNs) on demand, present them to different servers (even more than one at the same time, which is usually a requirement for clustering), and also have some nice added benefits like SAN-to-SAN replication and SAN-level backups, which can really change things when you have to handle high availability and disaster recovery for large (as in, many TBs) data sizes.

As usual, you can start to dig further here: http://en.wikipedia.org/wiki/Storage_area_network.


Of course, this usually makes sense for medium/large environments. There can be a case for a small SAN even in smaller ones. But in a place with two servers, any kind of SAN is much likely overkill.

Massimo
  • 68,714
  • 56
  • 196
  • 319
12

I can assure you we're far from insane, let's go through a few thoughts of yours.

it basically means that you connect your servers to your disks using an ordinary IP network

You can do this, but many don't, for better performance and reliabilty many have either dedicated networks just for Data Center Ethernet (for FCoE), IP (for iSCSI) or Fibre-Channel (FC) based communication to their array/s.

Ultra-320 SCSI is actually only 2.560Gbps, while 1Gbps Ethernet (as often used for iSCSI) is therefore 2.56 times slower overall many people are happy to deal with this limitation as their bandwidth requirements fit within that profile. But SANs very often deal with MUCH faster protocols, 10Gbps, 40Gbps and 100Gbps Ethernet links are used for both FCoE and iSCSI and in the FC world most link speeds are 4Gbps, 8Gbps and 16Gbps - best yet the FC and FCoE protocols are specifically designed to handle storage data and degrade much more gracefully under high concurrent load than either regular Ethernet or SCSI/SAS. By the way nobody would ever really use 100Mbps Ethernet for iSCSI in a production environment, who uses 100Mbps Ethernet for anything these days anyway?

So SANs can very often be much faster than locally attached storage but the main reason that people use SAN is to ensure their services stay up. They do this using block-level sharing and cluster-aware file systems, more than one server can have concurrent access to individual blocks of data at the same time, not files like when using a NAS. This is how us Pros can have things like multi-node DB clusters and fault-tolerant VMs.

And as you suggest it IS much easier to add, change and remove disks - if I have to use local disk I need to order the physical disk, have it installed and present it to the machine. With a SAN I just have one large block of disks already in place and just carve-up what I need as required, presenting it in seconds to the server as needed, when it's not needed the space goes back into the pool. Also failed disks are handled much more smoothly.

I think your issue is with your lack of experience at dealing with scale, I'm responsible for tens of thousands of disks, they die every day, but we have a very smooth process that involves global online spares that jump into place as required, meaning we can replace the physical disks during normal work hours - no need for midnight call-outs. If I just looked after a few standalone boxes then I wouldn't understand either but for businesses with important availability requirements, changing needs and large amounts of data to be reliably stored then I'm afraid that SANs are the only thing thing that makes sense.

Chopper3
  • 100,240
  • 9
  • 106
  • 238
  • 1
    I'm not going to jump in here with another answer when Chopper has hit most of my points. I'd add, though, that clusters like VMWare and databases work best with shared disk, and NAS isn't always supported for these (although it's getting there). – Basil Sep 07 '12 at 13:00
  • 1
    http://i.imgur.com/xVyoSl.jpg – Basil Sep 07 '12 at 13:24
7

Okay, first, usually you use san over networks faster than 100mbps. 1gig, 10gig or (most often) FibreChannel.

To get to the usability part, most of the servers now are virtualized. You have (for example) 10 hosts, which host 100 virtual machines on them. If the machines on one host start using too much CPU, what are you going to do? ...migrate them? How do you imagine doing that with local disks? Copy all the data (lets say 100gigs per machine) to another host, and then back? That's why you connect all the hosts to a large SAN, and keep all the data there, and just send around the running state of the machine.

There is also a case for better storage use - a DNS server, for example, usually needs very little disk space (if logs are sent elsewhere), but other services (file servers) need alot. If you combine them all on a large SAN, you can save space, by using all the drives there (even the free space on the drive, which would have otherwise been in a DNS server).

About data security - how many drives do you have? Drives fail... alot! If you want to keep the services running, you need atleast some level of RAID (usually raid1 or 5, or any other combination). So you're DNS server from example from before, need 2x 400gig drives (if you're buying a new server, usually you can't get less that that, unless you're buying an ssd). So every server needs x2 drives (or n+1 for raid5).

In a SOHO enviroment, SAN is a total overkill. In an enterprise enviroment, you really, really want to have centralized storage, atleast for virtual machines, and centralized hardware management for them.

mulaz
  • 10,472
  • 1
  • 30
  • 37
5

You post is so long, to be honest I'm not going to read it all (sorry), but yes, you are missing some points;

So I have my server connected to its disks with an Ultra-320 SCSI link. That's 320 MB/s of bandwidth shared between all the disks on that server. And then I rip them off the SCSI link and plug them into a 100 Mbit/s Ethernet network with its piffling 12.5 MB/s of theoretical bandwidth.

Yeah so use 10Gbps network fabric then. Add multiple 10Gig interfaces and bond them, have 40Gbps if you like. It's a simple answer. If you need more than 100Mbps of transfer throughput, don't use 100Mbps networking equipment.

Now, with each server having a dedicated SCSI link, that means every time I add another server, I'm adding more bandwidth (and more disks).

Dedicated SCSI link to what, a storage unit? Or a RAID controller?

But with a shared SAN, every time I add a server I'm taking bandwidth away from the existing servers. The thing now gets slower as I add more hardware, not faster.

No, again, add more interfaces to your storage volumes, and add more storage volumes. If you have a disk array with a dedicated SCSI link to a server, and you add another server with a dedicated SCSI link to the disk array, you have now split the bandwidth to each server in half (unless the storage unit can go twice as fast as the SCSI channel!).

Its quite rare that people run devices at 100% load all day and night, bandwidth is much less of an issue compared to seek times and latency. You seem to be caught up in what are merely architectural issues, that can be easily overcome.

However, this does not match my situation at all. I have two servers with three disks each. It's not as if managing all that stuff is "difficult".

Yeah, SANs aren't for you.

If you had even 10 servers that ran 10 virtual machines each, you would want them stored on a SAN. If all data is kept locally on a server, when that server goes down, the data is now offline. SANs improve data availability through replication to other storage nodes. They also improve integrity with centralised snap shots, backups and checksums.

As soon as you get any large number of servers, local storage quickly becomes to much management overhead and risk.

I'm not saying they are the best thing since sliced bread and work for every situation, especially give the potential cost. You really need to read a book I think about SANs like this one, it will clear things up.

jwbensley
  • 4,122
  • 11
  • 57
  • 89
1

You seem to look at raw access on a server. If you have a server which uses a lot of disk IO then having directly attached storage instead of storage via 100mbit Ethernet makes sense.

However:

1) Look at a file server role:

Clients <-> Ethernet <-> [ fileserver HW -> SCSI/SAS -> disks ]

Now at your SAN

Clients <-> Ethernet <-> [ Some hardware, probably with SAS and disks ]

Not so different now. :)

2) Network speed

There are quite a lot of faster interfaces then 100mbit Ethernet. Maybe not on a home/SOHO version. But a enterprise SAN will offer things like FibreChannel, iSCSI, 10GBit Ethernet, etc etc.

Hennes
  • 4,772
  • 1
  • 18
  • 29