0

I just needed to do a recovery of a vserver after a failed update. Because the scope of the affected files wasn't exactly known, I completely restored most directories, except /home which was not affected.

While recovering, the server was in a special recovery mode, thus with all services turned off.

After recovery, initially everything looked good. However, to my surprise, a svn repository and a git repository were corrupted. Mostly we have about 1-2 commits a day, so the probability for 2 commits taking place exactly at the moment of the backup is quite unlikely. In this case it was not a big deal to reconstruct the repositories, but how could that happen? Is it because of cached data not written to disk or something like this?

Would snapshot backups eliminate such a risk?

didi_X8
  • 137
  • 1
  • 6
  • What triggered the failed update? Was there an issue with the filesystem? Was the system in a state where it didn't write the files out to the filesystem for svn and git, so it was inconsistent? – Bart Silverstrim Feb 13 '12 at 19:55
  • Is the SVN repo in Berkeley or FSFS format? – sjbotha Feb 13 '12 at 20:33

1 Answers1

1

Offhand I'd think this: It's possible there was an issue with the filesystem, so the corruption was the result of that.

Or the system had files in memory, as you suspected, which weren't written to disk yet. Then when it went down, the files were in an inconsistent state.

It would be hard to tell without knowing what the damage was to the filesystem.

As for snapshots, if you mean a snapshot of a virtual private server on a hosted service, not necessarily. A snapshot of a filesystem won't help you with files that are "in flight"; that is, being manipulated in memory. If you were to take snapshots of an EC2 instance running a database, the snapshot would get the state of the filesystem, but not the cached data the database holds in memory, so a restore could end up with an inconsistent machine. This is why Amazon recommends shutting down running instances or unmounting EBS volumes before taking snapshots even though it's possible to do a live snap.

Bart Silverstrim
  • 31,092
  • 9
  • 65
  • 87
  • Just to add to the snapshot explanation: If you enable journaling (aka transaction log) on your database (as any production database should have) then after restoring the snapshot you can first repair the database and then 'play forward' the journal entries to get the latest updates. – sjbotha Feb 13 '12 at 20:32
  • Where would the journal get updates that were in memory at the time of the snapshot? I thought the journal just ensured that the database would be in a consistent state, not necessarily that it would save your data. – Bart Silverstrim Feb 13 '12 at 20:40
  • Right, anything that wasn't written to disk would be lost. Not saying it changes that. However (depending on the database design) even things that were written already could be lost if a snapshot is taken and journaling is not enabled. – sjbotha Feb 13 '12 at 20:47