18

I understand that the VMWare KB frowns upon long running snapshots mainly due to two things (In my opinion)

  • Taking tons of snapshots can fill up the data store. Snapshots are simply delta files. Let's say you have a 50 Gig VMDK, near full, and you take a snapshot. In your snapshot you flip every single bit. Your delta file will also be about 50 GB. Snapshot again, flip the bits, another 50 Gig delta file. These can get out of control fast.

  • Committing large snapshots carries risk. When consolidating snapshots you are writing the delta changes to the original VMDK. This takes time and carries the risk that if something happens you just nuked your VMDK.

Their warnings seem to make logical sense.

With that being said, is it inherently bad to run my machine permanently off of a snapshot VMDK? I want to make my tree the following:

  • Base
    • Snap1
      • Snap 2
      • You are here

Snap 1 and 2 will be taken immediately after installing and provisioning the base system. These are machines I plan to refresh frequently so I will simply make my tree look like the following:

  • Base
    • Snap1
      • You are here
      • Snap 2

Delete Snap2 and recreate Snap2.

I can not see how this could have any implications for the following reasons:

  • Since I simply installed a base image and took my deltas immediately after there is no way I could possibly fill up the data store. Assuming my base image is only 10 GB (on a 50 GB thin provisioned disk), even if my delta flipped every single bit the max my total usage could be is 60 GB (10 GB base VMDK which is locked + 50 GB of delta in the snapshot VMDK file). This assumes I do not create any further snapshots.

  • Since my use case does not call for consolidating the snapshots I do not risk errors upon consolidating my deltas. When I move back to Snap1 and delete Snap2, all of the delta that resided in Snap2 simply gets deleted.

  • The storage load is exactly the same, so I should be getting the same IOPS. I understand that some files (mainly system files) will exist on the original VMDK and others (everything after the base) will reside in the delta but I don't see how ESXI would care. All the files are on the same physical datastore so the performance should be equivalent to referencing everything in the original VMDK without snapshots.

Any thoughts? ESXI 5.5 with the data store being RAID'd DAS.

I do not have a vCenter license so templating and cloning is off the table.

RESULTS OF TEST

I got in early today to run some tests. Here's the results. There is a performance penalty but I'm not sure why.

Before Snapshotting: Before Snapshoting

After Snapshotting: After Shapshoting

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Not surely - as time goes, the snapshots will diverge more and more. Finally, they will be essentially different copies. After you don't spare much disk by snapshotting them, convert the snapshot to a completely separate volume. How? Normally, I use dd from a third VM, but mostly I am nearly crucified here for such heretic opinions, as this. :-) But: it will _work_, and will be _effective_. – peterh Dec 04 '14 at 22:51
  • @PeterHorvath - That's the stuff I love to hear. Smart, hacky, effective, bare-bones solutions. If you don't mind could you give me a write up on what you do in pastebin or something? Do you DD the VMDK and snapshot together? – VM_Storage_Inception Dec 05 '14 at 21:59
  • If I needed to do that more often, I did it with a script. But it is not the case, and in most cases I don't use even snapshpts, becaise they are slow. – peterh Dec 05 '14 at 22:55

3 Answers3

17

Yes, there are performance implications for long-running snapshots. There are even greater implications for consolidating delta VMDKs back to the original disk file. This can cause unresponsiveness in your VM's operating system or other undesirable behavior.

VMware has templating and cloning functionality built into vCenter. You need a $600 vSphere Essentials license to enable this.

You can create a VM to your taste, then clone it to a template. That template can then be used to generate new virtual machines from a "Golden Master" image.

enter image description here

This allows you to have a "clean state" but also create long-running or permanent VMs from that master image. No snapshots needed.

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Interesting, I'll look into that and se how it works. Unfortunately I do not have a vCenter license and would rather not have my org shell out the $600 if there are no performance implications to the snapshots being used in the way I describe. Also the templating and cloning seems no different than taking an OVA and redeploying it. Deleting the snapshots seem a lot quicker and I can't logically see how there would be performance implications even if it's not the "Official VMWare approved method". – VM_Storage_Inception Dec 04 '14 at 21:36
  • To respond to your edit, could you point me to an article or explain what the performance implications would be? I can't see how there would be any assuming I use them how I describe. Also I would never be consolidating the snapshots back to the original VMDK. – VM_Storage_Inception Dec 04 '14 at 21:40
  • I guess I'm trying to understand why you're insisting on designing around a feature that's meant to be used for short-term access. – ewwhite Dec 04 '14 at 21:44
  • @VM_Storage_Inception - it almost sounds like you are wanting a poor man's approach to VMWare's defunct product Lab Manager. – TheCleaner Dec 04 '14 at 21:45
  • @ewwhite, One these VMs will be refreshed to their base state weekly. Two, I don't see how there will be performance impacts since the machine will be taken down, reverted, and ran again. Three, to avoid paying for vCenter. – VM_Storage_Inception Dec 04 '14 at 21:46
  • @VM_Storage_Inception Okay. Go for it. Try it and see if it works for your use case. – ewwhite Dec 04 '14 at 21:48
  • 5
    Sometimes, *buying* the right solution makes sense. You've spent more [effort and man-hours inquiring about a workaround](http://serverfault.com/questions/647237/long-running-vmware-snapshots-are-bad-but-what-about-vmdks-on-a-zfs-lvm-snap) than just paying for a vSphere Essentials license ($600), which would give you a supported template/cloning option. – ewwhite Dec 04 '14 at 22:55
4

ewwhite's answer is correct, but just to expand a bit more or the performance penalty, consider the following scenario:

You create a VM. A virtual read from the vmdk takes one physical disk read of the same size. Fairly straightforward.

Now imagine you take a snapshot of the VM. Now, for every virtual read, you're going to incur 2 physical reads, one from the base vmdk and one from the delta vmdk, because you need information from both to get the current state. You're now at twice the physical disk reads.

For two snapshots, you're doing three times the reads, and so on. If you have a lot of snapshots, you can see how this can be a fairly significant performance penalty. It doesn't necessarily translate into n-times worse performance (due to caching, sections that haven't been changed, etc.), but it's not a good practice.

tfrederick74656
  • 1,442
  • 1
  • 12
  • 29
  • I'm almost sure snapshots use a "which block is in which file" table. So reading one single block will only result in one block read from the appropriate file. Of course, reading several blocks may result in accessing several files, which means a penalty for moving disk heads if you're not running from an SSD, but the total number of disk block accesses shouldn't change. – Guntram Blohm Dec 05 '14 at 09:34
  • 1
    The way I understand it, snapshots only store changes from the original disk. If you store file A, then take a snapshot, then change file A again, only the changes to that file are written to the snapshot. Thus, you need to read both the original VMDK and the snapshot to get the entire file. Otherwise, each snapshot would simply be a full copy of the original disk, which they aren't. – tfrederick74656 Dec 05 '14 at 09:41
  • that may be correct, but the total amount of blocks you need to read stays the same (e.g. 10 blocks from the snapshot and 100 from the base disk). ESXi first checks the existing snapshots for the blocks needed until it ends up at the correct snapshot (or the base disk). There may be a minor penalty because the system will probably skip that snapshot traversing part completely when there is no snapshot at all. Additionally, a long running snapshot file will probably suffer from severe fragmentation. – Dirk Trilsbeek Dec 05 '14 at 11:51
  • A virtual disk snapshot system that does N reads for N snapshot would be very stupid implementation. I doubt that's how it's implemented in VMWare. A simple optimizations could be done by simply creating an index file that stores in which diskfile each block of the emulated drive is. Suppose you have a 512GB virtual disk with a block size of 4kB, you only need a 64 MB index to determine in constant time which of up to 16 virtual disk file contains a block. – Lie Ryan Dec 05 '14 at 12:07
  • 1
    Based on the answers in serverfault.com/questions/430138 I have to disagree. I've always thought of snapshots as the result of binary arithmetic, not just a collection of new data. So if you have bits 01010101 in your base VMDK, you snapshot, then change those bits to 10101010, your delta would contain 11111111 (indicating that every bit in the original file changed, NOT the new value of 10101010). As much as I agree with the above comment, VMDKs are supposedly raw files. Where would the index be stored? I've never seen this mentioned in any VMWare tech pubs. – tfrederick74656 Dec 05 '14 at 12:08
0

VMware ESX snapshots are meant for short term usage.

Long usage and heavy IO can cause VM freezes. If you have case when write IO is larger/faster than snapshot consolidation ESX will freeze VM to protect data. With time snapshots get fragmented and ESX does internal consolidation you can experience periodic freezes.

You can perform VM templating manually through ssh. Copy VM folder containing vmdk, vmx, etc. to new one folder. In vmx file of newly copied VM change UID and MAC address.

VMware has product, Linked Clone, which is same thing you are trying to do. And they say it has potential performance problems. In practice you'll be remastering VMs after a while. https://www.vmware.com/support/ws5/doc/ws_clone_typeofclone.html

dario
  • 131
  • 4