1

We are a $3B company with a team of 6 infrastructure experts. I am a DBA, not part of infra team.

Our setup is all VMware ESX 5.1, EMC SAN for storage and ExaGrid for backups. Prod and non-prod servers are hosted in seperate DC's in 2 different cities. Prod backup share is replicated to non-prod share, the lag usually 4 - 8 hours. My non-prod database restores now take about 5 hours which costs us late night work and downtime. If I copy the backups to local drives first then restores complete in 1-2 hours. I requested an additional 500gb local drive on each of the 4 non-prod servers and the infrastructure team rejected it, saying it costs about $5k for 2TB. Fair enough.

In this case, I don't need any resiliency, fault tolerance, fault detection, mirroring, replication, backup, recoverability; none of that. Data is not important, all I need is reasonable speed for a few hours twice a week. The goal is to restore databases in 1 - 2 hours. I looked at RAM and CPU usage, and they are not the bottlenecks.

My question is: is there a way we can use these SSD's as a cheap additional storage as an alternative to the expensive SAN?

If yes, apart from the cost of the drives what are other costs involved?

Any other way to bring the cost down to say under $2k or even $1k?

Don
  • 21
  • 1
  • What are the makes/models of the servers involved? Do you need this storage to be presented to VMware, or are the DB servers physical? – ewwhite Nov 17 '14 at 16:19
  • 4
    Yes in theory you can do things a lot cheaper and get some more performance as well with local SSD's. But, as you're already experiencing, there is a vast difference between what is technically possible and what an enterprise infrastructure department is either **willing** and/or **allowed** to deploy. - Local optimisation, *reducing your cost*, may invalidate any number of other (historic) design choices/requirements in your larger data center strategy, which were made to reduce *overall cost/complexity/operational risk*. (Although I would assume that those likely need to be revisited.) – HBruijn Nov 17 '14 at 17:17
  • Basic question about the hardware in use... without that info, it's hard to make a real recommendation. – ewwhite Nov 17 '14 at 19:20
  • As the author of your linked storage question, and its primary answer writer, I don't want you to get the wrong idea. Yes, enterprise storage is expensive, but you are totally right; you don't NEED enterprise storage. You need a local, volatile scratch disk, so you're heading in the right direction. The real challenge will be getting your storage/sysadmin teams to get things done. – Mark Henderson Nov 17 '14 at 19:55
  • 1
    VTC only because I agree with @HBruijn comment. What is possible or not still doesn't mean we experts/sysadmins should be taking the place of your company's sysadmins. That doesn't mean your possibly eventually closed question can't be answered still, you can work with your IT Team on this. If they continue push back then you document that push back, everyone agrees on the constraints/outcome and then you move on and take a negotiation class. :) – TheCleaner Nov 17 '14 at 19:58
  • 1
    I get the question, but I don't get why you are asking it here. What does your management say about it costing you late nights and downtime? How much does that cost them in money and morale? What do your infrastructure team say if you ask them how they suggest the restores can be made faster? Why aren't your non-prod restores automated to happen overnight after a replication sync, without you having to be involved? – TessellatingHeckler Nov 17 '14 at 20:39
  • Thanks for all the responses. There is a bit of politics between infra team manager and my manager, Infra team does not seem to care if we have to spend hours in monitoring and housekeeping trying to save pennies. once i asked for 8gb additional RAM they said let's do 2gb at a time then you test/monitor for a week and so on. if instead of 8gb 6gb works, we could save $40 in RAM but it will cost us several hundred bucks in DBA time. So we try to suggest solution. I have zero experience on h/w side, hence the question with whatever info i have. Thanks again folks! – Don Nov 17 '14 at 22:04

4 Answers4

2

Yes, there is. You run into the typical enterprise idiocy of pushing everything to the SAN - something that will come and kill you mid term performance wise. There is a reason, for example. MS SQL Server allows LOCAL SSD for tempdb since 2012.... speed vs. cost. Heck, there are many cases where even production data can happily live on local discs without SAN resilience because you have an application level replication in place (for example : SQL Server Always On Availability Groups).

Basically: Your Infra team tries to solve everything by standardizing on a technology doing everything and expects you to pay. This is a perversion of their work - which would be to standardize on valid approaches for everything, and yes, having local temporary space is quite critical, especially for databases. And no, it does not need resilience.

Your particular SSD will work - but burn out quite fast likely. Still the concept is valid. I would likely get a couple of Samsung 843T ;)

TomTom
  • 50,857
  • 7
  • 52
  • 134
  • Thanks. what do you mean by "burns out fast" ? we can buy 3 year warranty for 20 bucks. – Don Nov 17 '14 at 22:17
  • 'everything on SAN' makes sense if you're looking at it as a consolidation and aggregation exercise. _Most_ systems are inherently 'bursty' so sharing resource makes sense. For systems that are not, it no longer makes sense – Sobrique Dec 15 '14 at 10:35
1

If all you need is a quick restore/rollback, you need local storage on the hosts, not an additional LUN on the SAN. Typically this is referred to as DAS (direct attached storage) and it can come in the form of externally attached storage box filled with drives, or an internal disk or ten.

The cheapest solution is an external USB drive, which can allow for a ~500Gb restore in ~5hrs in good conditions, USB speed at ~25mbps being the bottleneck.

An internal SSD or even 15k SAS (potentially a RAID array, for more IOPS) will be much faster to restore of course. For external access you'll need a SAS HBA, and a DAS appliance.

Keep in mind that these do not cancel the requirement for a proper backup/restore/DR scheme. The cost of these solutions can vary greatly, maybe even to the point where EMC LUNs come cheaper.

dyasny
  • 18,482
  • 6
  • 48
  • 63
  • 1
    USB3 is unlikely to be present on the server hardware being described. – ewwhite Nov 17 '14 at 16:47
  • double the restore time then. – dyasny Nov 17 '14 at 16:48
  • In my setup, I have several Dell MD1220 boxes attached to the hosts that need local PIT copies, but they cost quite a bit initially, and they take up 2U each – dyasny Nov 17 '14 at 16:49
  • 5 hour restore time is not acceptable so USB won't work. seems like SSD or 15k SAS is the way to go. "These do not cancel the requirement for a proper backup/restore/DR scheme" not sure what you mean by that. does it still get backed up/replicated or we still pay for it due to contract/agreement? – Don Nov 17 '14 at 22:21
  • I mean backing up to archive-able medium is also an absolute must, and DR site replication too. This local backup is not a replacement for what already is there, just an addition – dyasny Nov 18 '14 at 01:43
1

If your shop is anything like mine, here's what you do:

  • Define your needs. Don't include suggestions of how they might be met.
  • When they return with a cost, if it's more than you're willing to be charged back for what you need to do, then agitate with management.
Basil
  • 8,811
  • 3
  • 37
  • 73
0

to find a quick solution I would recommend you to just ask for a DAS(direct attached storage)! Often the performance issue happened because the SAN is attached via a 1GB LAN or the disks are too slow for too many DB applications. A DAS will solve this issue, because you are the only one on this storage and you do not need to use any of this: fault tolerance, fault detection, mirroring, replication, backup, recoverability.

heinz
  • 1