8

I started a snapshot of 1TB volume that carries 750GB of data in AWS EC2 without shutting the instance down. It gave completed status, progress 100% when I noted after 10hrs. I can see the started time. But how to know the exact completion time of the snapshot?

supercontra
  • 225
  • 1
  • 3
  • 10

2 Answers2

13

The completion time is generally not important, as the snapshot is of the volume as at the time you request it, even if you change it while the snapshot is being made. It can take several hours, but it generally doesn't matter how long it takes.

If you need a completely consistent snapshot you shut the server down, trigger the snapshot, then start the server immediately. This primarily ensures all data is flushed to disk and in a consistent state, with nothing in RAM. It's similar to Windows shadow copy.

In most cases a snapshot of a running system is fine, but sometimes it will be inconsistent. If an application is writing to the disk when the snapshot is requested that data may be corrupt.

EC2 Snapshot documentation is fairly good, like most AWS documentation.

Tip: the first snapshot of any volume is the slowest as it needs to back up every block. Also, the longer the interval between snapshots the slow they tend to be as there's more data to snapshot. If you need a really fast snapshot at any particular time, take a snapshot a couple of hour earlier as that will make the next snapshot faster.

Tim
  • 30,383
  • 6
  • 47
  • 77
  • 1
    *"If an application is writing to the disk when the snapshot is requested that data may be corrupt"* is not, in the strictest sense, an accurate description, since it implies that the problem lies in the snapshot. The data in the snapshot won't be inconsistent with the data that was on the volume, but the data on the volume may have been in an unusable state, depending on how well the filesystem/application can recover from any partial and unflushed writes. A snapshot is the same thing you'd have if you pulled the power cord on a physical server, then made an image of the disk. (also, +1) – Michael - sqlbot Jun 23 '18 at 14:24
  • 1
    Making a "throwaway" snapshot, first, is a useful strategy. Start the snapshot, then wait. Come back later. Once that is done, do whatever is necessary to make the "real" snapshot (e.g. `fsfreeze` until the snapshot begins) and it will typically complete much more quickly, because it can integrate the unchanged blocks captured by the throwaway. Then you can safely delete the throwaway, because EBS snapshots can share identical data blocks yet no snapshot depends on the continuing existence of the others. – Michael - sqlbot Jun 23 '18 at 14:32
  • This doesn't answer the question. – Crescent Fresh Jan 16 '22 at 02:55
  • 1
    @CrescentFresh you're right, but I'm not sure there's any way to find out the completion time, and it's usually not particularly important. The start time is the important part as that's the time the snapshot is taken. – Tim Jan 16 '22 at 08:01
  • 1
    @Tim: yes I get it, it's impossible to know currently. It would be useful in the case of trying to gauge how long a particular system might be impacted during the snapshot, in order to provide as much information to people. We ended up timing it on a staging environment (start snapshot, polling until done, measure how long it took). – Crescent Fresh Feb 01 '22 at 22:56
  • I'm not sure that EBS performance is affected by snapshots. The EBS FAQ and documentation both say this (one page has slightly different wording but says the same thing) "Another factor is taking a snapshot which will decrease expected write performance down to the baseline rate, until the snapshot completes. This is specific to st1 and sc1.". Did you test performance before / during / after taking a snapshot @CrescentFresh? – Tim Feb 01 '22 at 23:16
  • @Tim We stopped all writes and unmounted before beginning the snapshot. That's the driver here, we wanted to gauge how long that system would be down for. It was a multi-node cluster so we needed a consistent snapshot for all nodes. – Crescent Fresh Feb 03 '22 at 03:48
  • Ah, interesting. Yes that can be a good thing to do if you want to be 100% sure your snapshots are consistent. Application level backups (e.g. database export if it's a database) can be another way to achieve something similar, but may have a higher restore time. – Tim Feb 03 '22 at 06:45
  • TL;DR it was <= 23 min for 15 GiB, not sure how long it took. I created my the first snapshot from a 15 GiB gp2 volume (wanted to move to gp3 and make it bigger) with the instance shut down first. Not sure how much time it took exactly, the web UI is terrible (no progress, just Pending, no update on the page when snapshot is ready, yo Amazon, have you heard about websocket and the observer pattern?). After 23 minutes I refreshed the page, and the snapshot was already Completed. Maybe I tried refreshing earlier? Don't remember exactly. – Attila123 Feb 05 '22 at 12:39
2

There is no pre-defined completion time. The bigger the data is, the more time it takes to backup. It is recommended to turn off the instance for data consistency or may be not do any read/writes while the backup is in progress.

  • Does snap complete faster if I do it after shutting instance down? If so, atleast can I see the 100% progress manually? – supercontra Jun 23 '18 at 08:16
  • 1
    No, it doesn't snapshot faster if the volume is shut down. See my answer for more details. – Tim Jun 23 '18 at 09:15