The snapshots for one of our AWS volumes are corrupt. We use these snapshots as backup, and in the past they have been a great help. (NB: This is not our only backup method!) A corrupt snapshot is useless however.

I wonder how to handle this, how to detect this beforehand, etc.


We have an AWS webserver with one large ext3 volume (DATA) with many images in one folder. We make daily snapshots of all volumes, and as we keep them for four weeks, this one is too costly. I just need one snapshot of the images in case of emergency, and for the rest of the volume I want the normal amount. This is what I wanted to do:

  1. Create snapshot from volume DATA
  2. Create new ext4 volume IMAGES from snapshot
  3. Mount volume IMAGES, remove all files and folders except the images folder
  4. Move the original folder to the root of the volume DATA
  5. Symlink to the new images folder on IMAGES from the original location on DATA
  6. Rsync all other data to a new smaller ext4 volume: WEBSITE
  7. Replace the DATA volume with the WEBSITE volume, linking to the IMAGES volume

Step 3 didn't work. I got the following error:

sudo mount /dev/xvdf /images
mount: mount /dev/xvdf on /images failed: Structure needs cleaning

Googling for this error I found the advice to do an xfs_check, but the filesystem is ext3, so I tried e2fsck. This resulted in endless errors and fixes that didn't seem to work.

sudo xfs_check /dev/xvdf
sudo e2fsck -f /dev/xvdf

I created a new volume, IMAGES, and used rsync to copy everything over, as cp resulted in a crash. I immediately created a snapshot of the new volume, and restored that to see if that worked OK, which it did.

Then I proceeded with splitting the volume, and replace the old volume with the two new ones. This all works, and the problems are solved.

Amazon Support

Still I want to know what happened here, and how to prevent this in the future, so I contacted Amazon Support. They told me that the snapshots were corrupt probably because the snapshots were taken while the volume was in use. We do that all the time, have done many restores with those snapshots (but not this volume), never a problem. This volume was attached, but without writes to it at the time of the snapshot.

I decided to take the advice, detach the volume, make a snapshot, and see what happened. After detaching, the original DATA volume could not be attached anymore. As I had replaced this volume already, it has no consequence, so it's not a big problem, but clearly this doesn't work like adv(ert)ised.

The snapshot can be attached and mounted, and I can open open folders etc. When I perform an e2fsck, I get the errors again. Looking back I forgot to do this e2fsck on the original DATA volume, which is a pity. I guess that would have reported the errors as well.

Amazon Support was below average this time, and that's a pity.


  1. How can I detect problems like these without having to test each volume/snapshot manually from time to time?
  2. Can I set a volume temporarily to write only? How do I do that?
  3. I read about the badblocks command for problems like these (Structure needs cleaning). As I restore a snapshot to a new (virtual) volume, checking that volume seems useless as it's on a different physical location. Is badblocks useful in a case like this?
  4. Fsck seems to change the disk content. What is a safe method to test a problematic disk like this one?
  • 561
  • 4
  • 12
  • 27

1 Answers1


The snapshot is not corrupted. The filesystem that the snapshot contains is corrupted. There's a difference.

The filesystem in a snapshot can be corrupted if you take the snapshot while the filesystem is in the middle of writing data. This can happen when only some of a all-or-nothing group of blocks were written when the snapshot was initiated.

Previously, if your old snapshots were taken while the volume was in use, and restores were fine, that is simply out of luck: the filesystem was not being written to at the time the snapshot was initiated. Your good luck has now run out and you have run into the probable repercussions of the scenario.

1. Preventing the Problem

The simplest way to deal with this issue is to simply prevent it from happening in the first place. To avoid such issues, AWS recommends:

  • pausing the filesystem (eg. fsfreeze),
  • unmounting the filesystem (eg. umount), or
  • stopping the EC2 instance (eg. aws ec2 stop-instances).

See: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html

2. Fixing the Problem

Since you've found yourself with a corrupted filesystem, your best course of action is to fix the filesystem before doing anything else.

  • Use Linux tools such as xfs_check or e2fsck to fix any corrupted blocks on your filesystem.
  • Create a new EBS volume and try to copy files to it.

Once your filesystem is fixed, then put measures in place to prevent the problem (see section 1).

Additional Notes

  • An active EBS volume's filesystem cannot become corrupted by taking a snapshot. Only when you restore a volume from a snapshot that was initiated mid-write will you get a corrupted filesystem.
  • The filesystem could have been corrupted when you created your snapshot in your step 1. Or it could have been corrupted already if your volume was restored from an older snapshot.
Matt Houser
  • 9,709
  • 1
  • 26
  • 25