8

I am seeking recommendations for shared storage options to support ESXi HA cluster (note I'm NOT asking for product/brand/model recommendation - I know this is against the rules here). I am asking for technology recommendation.

The company I work for is a small business. At the moment, we have one HP DL380 G9 with DAS, with ESXi 6.0, running our custom developed application. We are now looking at how to achieve HA/FT using the most economical option. We need HA/FT because I'm the one-man-IT-team and I am often away traveling so manual failover/restore is not an option.

I understand we need minimum of 2 ESXi host (physical server) and shared storage to achieve HA/FT. This is, I think, where it gets interesting: even the cheapest entry level storage array out there is probably an overkill for us. Our storage capacity requirement is probably around 200GB, and we don't see that doubling for at least 5 years. Yet, we need the shared storage for HA/FT.

Thus would really really appreciate any recommendation on my options. Thanks.

peterh
  • 4,914
  • 13
  • 29
  • 44
Arthur
  • 223
  • 2
  • 5
  • How about an NFS-based NAS? They can be very cheap but then again a cheap one will be a bigger single point of failure than your current setup.Ideally you want dual PSU's and controllers, and those aren't cheap. Something like a HPE MSA would be good but again not free. Where are you in the world? I'm in the UK, perhaps we could chat about this. Oh and you know FT is VERY network 'heavy' right? – Chopper3 Mar 11 '17 at 07:55
  • I would have responded if this question weren't closed. – ewwhite Mar 11 '17 at 12:20
  • @Chopper3 I have considered running NFS on our QNAP NAS, but I feel the model we have probably won't be fast enough for the failover. I have looked into HPE but even the entry model is quite pricy and I think we would struggle after spending a big chunk of our budget on the vSphere licensing. I am in Taiwan unfortunately, otherwise would have loved to chat in person! – Arthur Mar 11 '17 at 12:25
  • @ewwhite i've re-edited my question (made a typo) but i'm not sure if EEAA is going to re-open it.... perhaps use comment to reply? thanks. – Arthur Mar 11 '17 at 12:27
  • Where are you located and how good are you with Linux? – ewwhite Mar 11 '17 at 12:40
  • What are your business uptime needs and tolerance for unplanned downtime (think [RPO, RTO](https://www.druva.com/blog/understanding-rpo-and-rto/))? – ewwhite Mar 11 '17 at 12:41
  • Oh yes and FT doesn't work on NFS sorry, I forgot – Chopper3 Mar 11 '17 at 13:51
  • @ewwhite i'm located in Taiwan, very far from where you are : ) I'm ok with Linux. our business application is written and developed by me, deployed on LAMP over ESXi VM. I can write some basic shell script and do some basic linux sysadmin stuff, but that's after many hours of research and trial and error, so definitely not expert level. and i'm definitely not skilled to implement my own HA/FT through heartbeat/rabbit mq etc thus was hoping to let ESXi manage that – Arthur Mar 11 '17 at 14:40
  • @ewwhite with regards to RPO and RTO: we don't really have a time limit for RPO. we are very good at manual stuff as everything was paper based before i joined so our BCP is great : ) Now for RTO, i know this is going to sound stupid, but RTO is ideally 0. Not because of data accuracy, but because, whilst our BCP is great, the effort for manual work and the data patching afterwords is extremely time consuming (we are a food retailer, manufacturer and we also have a health clinic). Thus ideally, we want HA/FT to ensure 0 downtime (ok with performance impact). – Arthur Mar 11 '17 at 14:45
  • You don't have to do all of the failover in VMware. You could build the fault tolerance into the application. Failover DB's with some load balancing. If failover in VMware is a must, look at a Netgear or Synology NAS with iSCSI. Just remember, when do do that, the storage is now your single point of failure. Best bet is to use virtual storage, although that requires 3 hosts. Perhaps Hyper-v might be more cost effective. Another option would be to move it into the cloud. Just build a p2p VPN between you're site, and the cloud vendor. Good luck! – Linuxx Mar 11 '17 at 16:16
  • What's the budget? – ewwhite Mar 14 '17 at 00:40

3 Answers3

10

General notes (stream of consciousness):

  • Think really hard about what you're trying to protect.
  • Nobody uses VMware Fault-Tolerance. Okay, maybe someone does, but there are too many restrictions, and the use case is particularly narrow.
  • Servers are more reliable than you expect, especially when working with quality systems like HP ProLiant. Supermicro would be another story...
  • Assess realistic failure modes. An HP ProLiant Gen9 server isn't just going to fail.
  • You may encounter individual component failures, but there are enough internal redundancies to deal with most issues gracefully.
    • Seriously, redundant power supplies, redundant fans, RAIDing of internal disks, the onboard NIC and FLR adapters rarely fail.
    • Add ILO monitoring, comprehensive hardware health checks, and the range of uptime-impacting items is reduced to DIMM failures and system board problems.

So now we come to shared storage. Shared storage becomes a point of failure, depending upon how it's architected.

  • Something like an MSA SAS-attached array is an option and can work with VMware and two hosts. You can buy them bare and add the requisite capacity.
  • A shared-nothing setup would be beneficial in some respects, but adds certain complexities.
  • There are Hyperconverged options like the VMware vSAN, the HPE StoreVirtual VSA or Starwind's Virtual SAN offering.
  • The HPE VSA may be free for up to 1TB of storage for your setup.
  • An entry-level SAN isn't that compelling considering your space requirements are incredibly low.
  • It's possible to go with single-headed storage... possibly even just a normal HP server with a storage OS of your choice (Linux exporting NFS, Windows Storage Server, etc.)
  • I've documented and outlined a ZFS solution for Linux that can provide dual-head failover and clustering for storage: See: https://github.com/ewwhite/zfs-ha
  • Another solution that can do shared-nothing with a pair of servers is Zetavault.
  • Couple that with Veeam VM-level replication or something array-based, and you've covered 99% of the potential storage issues.

But again, this is a function of your risk. People can easily go down the High Availability rabbit hole...

Dual Hypervisors hosts... okay. Then do you need dual switching fabrics? Stacked switches? Multi-chassis link-aggregation (MLAG/MC-LAG)? One SAN with dual-controllers? Two SANs? SAN replication? VM replication? VM replication to diverse storage?

Do you have power diversity? Multiple PDUs? Multiple UPS units? Is the site generator-backed?

So, what are you left with?

I think it's best to have some options. Maybe contract additional help for coverage. Document the solution well enough so that the customer has some options. Make a DR or system outage runbook/script.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • thanks for the very detailed answer! You've provided me an interesting perspective on "not being so paranoid" on reliability of servers. I see what you mean about drawing the line on where HA stop, we have multiple PDUs running off two UPS and different circuit breakers, but that's it (no STS etc). Having said that, we will definitely need two hypervisor hosts thus will now focus on your suggested shared storage options. The HPE VSA looks quite appealing and the keen on ZFS solution (but in reality my linux sysadmin skill seems to fall short). Thanks once again, very much appreciated! – Arthur Mar 15 '17 at 03:36
  • And regarding budget, we were hoping to knockout share storage + a new DL380 G9 within US$10k. The reason for asking this question in the first place is because we know that's not achievable if we went for HP MSA or similar commercial storage arrays. – Arthur Mar 15 '17 at 03:39
  • Since your storage capacity needs are low, a used or old-stock HP P2000 G3 with 4 or 6 SAS disks would perform predictably and could be under $4k. But the VSA approach is nearly free, but you'd need a 3rd host. Or just using another server to be the storage may be reasonable enough. Lots of options. – ewwhite Mar 15 '17 at 06:47
  • 1
    I'd certainly look at Starwind VSA then. No need for a 3rd node, support of active-active scenario and as result, a good performnace. – batistuta09 Mar 15 '17 at 11:54
5

If your company cannot withstand downtime for the users, VMware FT is your choice then. To implement this feature, you'll definetely need some kind of shared storage. For the case, I would recommend looking at software-defined storage (SDS) solutions that are increasingly being used for building virtualized infrasructures. With this approach, you can virtualize the local physical storage resources of your ESXi hosts and turn them into a fully-fledged virtual SAN. VMware VSAN springs immediately to mind, but I would point out some very interesting alternatives that should be much cheaper to implement at ESXi environment. The first candidate is HPE VSA: good level of functionality and an annoying requiremnt of a third voting node for a quorum. Yeah, I know, you can still go 2 nodes, but if you're not ok with downtime, the quorum is a must. The second candidate, on the contrary, has minimalistic hardware footprint with just two physsical hosts along with set of the features like caching, data compression etc. It is StarWind vSAN. The both solutions have free versions, just check and see how you would benefit from them.

batistuta09
  • 8,723
  • 9
  • 21
2

The technology you would be best served by is "software defined storage". A VM that makes locally attached disks available to all VMs, ideally providing redundancy by allowing the use of local disks on multiple nodes at the same time (allowing you to lose a node without losing all your VMs). Since we're not talking about product recommendations, I'll leave it at this. It's still a nascent market, but there are some well established options that would fit the bill.

Basil
  • 8,811
  • 3
  • 37
  • 73