2

I am seeking suggestions and comments on how to recover from a 100% full vSAN (without the obvious, reset to factory option). I have an 8 node ESXi cluster which runs entirely on a vSAN backing. Due to circumstances with a vendor that I would prefer not to go in to, the total disk capacity was undersized for the storage requirements. With the end result of the vSAN hitting the 100% utilized wall hard and handling it about as well as an egg hitting a tile floor. Since the hosts themselves also boot from/live on the vSAN; when this condition occurred the hosts locked up and several of them crashed dramatically cutting the available disk size on an already full vSAN. I have been able to regain access to some of the hosts, but with the vSAN thrashing disk in a vain attempt to rebuild the array it is dreadfully slow to respond and vCenter is unavailable so I can only manage individual hosts using SSH & the vCenter thick client. This removes most of my controls over the vSAN object, so I've found my options to recover have been severely limited.

A few points:

  • I am well aware that filling any SAN technology to 100% capacity is a recipe for disaster so let's skip those obvious and unhelpful observations.
  • I understand and accept that data loss is pretty much inevitable here but I would like to save as much as I can while deleting what I need to in order to recover the cluster to a functional state.
  • The manufacturer has already advised that the cluster has to be reset to factory, but I've seen many cases where the community can provide better answers.
  • As the cluster is non-functional I am willing to take risks and try radical ideas that would normally be out of the question.
Mario Lenz
  • 1,612
  • 9
  • 13
  • Do you have current backups of the VM's that live on the vSAN? If so, you could try deleting some of the less critical VM's to free up space in order to get the cluster back to a working state. Then restore the deleted VM's once you've addresses the capacity issue. – joeqwerty Jul 06 '17 at 15:21
  • I do have backups so I can tolerate data loss. I've been attempting to do so, but with the array continuously trying to rebuild it consumes the space almost as quickly as I make it. I haven't been able to find a way to tell vSAN to just chill for a few and stop trying to rebuild until I can get ahead of it. – NorthVandea Jul 06 '17 at 15:37

0 Answers0