19

I have storage that allows me to thin provision my volumes presented to the clients. Is this safe? What are the best practices?

Basil
  • 8,811
  • 3
  • 37
  • 73

2 Answers2

16

Generically, whether you're talking about SCSI LUNs (SAN) or network file systems (NAS), thin provisioned storage is when you tell the storage client that it has more space than you've actually allocated to it. This has no risks on its own, but if you don't have enough actual storage to allow every single container to grow to the full promised size, that's called overprovisioning and it entails risk.

Advantages

The advantages of overprovisioning and thin provisioning are compelling. Many consumers of storage (servers, file share users, etc.) will request far more storage than they initially need, and continue to ensure they have a safe margin for growth as they grow. A centrally provisioned safe margin for growth is far more efficient than hundreds of small ones. The utilization of the underlying storage without thin/overprovisioning can be very low, and this allows a higher rate of utilization.

Risks

All the risks of this scenario are linked with overprovisioning. The more you overprovision, the higher your risk. The danger is the potential for the utilization of storage resources to completely fill the available storage, which will generally cause all the storage containers to fail in one way or another. Filesystems will go read only or offline and LUNs will go offline.

Best practice

In order to get the benefits of higher utilization that come with overprovisioning while mitigating the risk, you need to constantly monitor the storage and be able to take action when required.

  • Use software to monitor and alert on pool utilization conditions. If there's nothing in a box that will do this, write it yourself. Most storage supports CLI commands that can be read by a script that you schedule to run frequently. The frequency should be high enough that none of your pools is capable of filling up between polling events.
  • Establish a baseline threshold. All new pools of storage with overprovisioned clients should get this applied by default. This threshold should be the most conservative one in your environment.
  • For smaller pools, use a lower threshold. If you give yourself 30% of warning on a 100TB pool, you have a lot more time to add disk than if you have 30% warning on a 10TB pool, assuming they are both capable of ingesting writes at the same speed.
  • Adjust the threshold up if you're less overprovisioned. If you have a pool that's only 106% overprovisioned, hitting 70% utilization isn't nearly as risky as a pool that's at 200% overprovisioning.
  • Adjust your thresholds based on how much time you need to add space to a pool. In my shop, we keep online storage in each box held back for growth in any pool, and more storage on a shelf ready to be installed into any storage box. We do this for enough types of storage that we can handle growth in any pool.
  • Wherever possible and applicable, thin out your storage. Deduplication works to decrease your utilization, and if you are using LUNs, zero page reclaim and clients that are able to perform storage unallocates when they delete data both help.
Basil
  • 8,811
  • 3
  • 37
  • 73
  • We've taken to quoting 'subscription' in terms of both provisioned capacity vs. total capacity. But also in terms of unused provision vs. free space. So in your example - 70% utilisation, with 200% subscription - you've got the remaining 130% provisioned against 30% of the actual storage, giving yourself 433% subscription ratio. (where '106% vs. 70%' means 36%:30% = 120%) – Sobrique Apr 25 '14 at 16:09
  • We don't tell the clients anything about this, but we certainly lower the threshold that would cause us to add disk when we're at a higher provisioned capacity. – Basil Apr 25 '14 at 16:56
  • Chargeback and reporting is an important part to think about, certainly. I'm in two minds really - on one hand, if they don't need to know, and trust the storage team to get on with it, then that - to my mind - is the best way. However, I've run into situations where they trust the storage team to get on with it - up until it's time to backfill, and so try and stall the purchase order for more disks. – Sobrique Apr 25 '14 at 21:18
  • 1
    We decided that it was fine to pass on the savings from going thin equally to all storage clients. We bill per addressed TB. – Basil Apr 26 '14 at 11:54
  • Monthly or capital cost? I've been tripped up by the latter, simply because it's very hard to estimate ratios over the service lifespan. But it can be quite hard to convince accountants that you don't want to do capital expenditure models any more. – Sobrique Apr 28 '14 at 20:01
  • That's a whole other kettle of fish :) – Basil Apr 28 '14 at 23:15
  • Thing is - I'm not sure that it is. Thin provisioning is a way of allocating more than you've got. There's no good reason to do it, aside from to save cost to the business, because you're not buying capacity that you don't "need" - but with a risk attached that you don't have it to give it instantly. Therefore that risk/cost is very much a business process question - you don't want to be in a position where you (as a storage admin) are taking a risk on behalf of the business, but will be left carrying the can if it doesn't pay off. – Sobrique Apr 29 '14 at 06:37
  • We charge people for what they see, and since they get no discount for not using it all, we understand that they have no incentive to stay thin, other than it being the right thing to do. Since we charge people for what they see and hide the risk from them, all the money we save is on the "back end" IT budget. – Basil Apr 30 '14 at 10:46
9

The point and purpose of thin provisioning is similar to the reason to use a consolidated storage in the first place - by consolidating, you get a better peak capacity, with a lower average needed.

But be under no illusions - thin provisioning is pretending to allocate something, without actually doing so. There are many reasons this is useful. Two key ones being:

  • Higher utilization - unless your volumes are completely full, the disk space is wasted. Most systems don't run at 100% full all the time (and are generally assumed to be 'in trouble' if they are).

  • Deferred spending - if I give you 10TB today, but you fill it at 2TB per year, I can probably pay less if I wait before buying the disks.

You have two gotchas arising from this though:

  • running out of disk too fast - someone who starts filling 'their' disks can run the rest of the enterprise out of space.

  • spindle counts - buying fewer disks means you've got fewer spindles and thus fewer IOPs. Which means your disks will run hotter, and your performance will be worse.

Things I would suggest as a best practices for thin provisioning:

  • Get management 'buy in' to the risks involved.
  • set an 'acceptable' oversubscription ratio. (This is a business risk decision, so hand it upwards).
  • Also consider individual volume sizes. A 20TB volume is more likely to gobble up space than a lot of 100GB volumes.
  • Have capacity (or a purchase order) ready to go when you start running low (based on 'free space' or 'volume size'. You don't get as much warning that you're about to run out, and you probably can't wait until the next quarter/financial year to back fill - you're not buying new capacity any more, you're back filling stuff you've already 'sold'.
  • Consider theoretical max capacity of your storage system. Think very carefully about what you'll do if go past it.
  • pay close attention to your performance. IOPs/throughput both. You probably won't get a good response to 'how much performance do you need' questions. But you may find you 'run out' of performance faster than you would otherwise. Set a threshold for this too.
  • consider your charging accordingly. You save money by thin provisioning, but you will NEED some of it back to keep up with your thin provisioning model.

I can't overstate that last point enough. You may well have customers who ask for storage and never use it. That's money you didn't spend and represents a saving. However, that's not the same as the customers who take a while to use it (e.g. more than a financial year) - you save money by buying bigger/cheaper disks next year. But you DON'T get away with 'selling' the space up front and just hoping that no one ever uses it. You may well end up filling up the whole lot over time, and you need to be ready to back fill.

Sobrique
  • 3,697
  • 2
  • 14
  • 34
  • 1
    In my shop, oversubscription isn't visible to the data owners unless they ask for it. We make it a storage decision, but promise to never bust a pool. – Basil Apr 24 '14 at 17:23
  • 1
    That's an option - and probably a sensible one, provided 'storage' don't then have to fight for the 'more disks' capex. That's more a question of politics and finance though :) – Sobrique Apr 24 '14 at 19:48