I'm looking to backup various directories and files from a Linux server to AWS Glacier. I'm trying to work out the details on how to do manage this.
Incremental Backups
I want to upload files incrementally. So essentially, if a file hasn't changed, I don't want to upload it again to Glacier if it already exists on there. I think I have this part figured out. Because you can't get instant lists of the archives in your Glacier vault, I'll keep a local database of uploaded files, in order to be able to tell what exists in the vault and what doesn't. This will allow me to do incremental backups (only uploading missing or changed files).
Can't Overwrite Files?
According to (http://aws.amazon.com/glacier/faqs/):
Archives stored in Amazon Glacier are immutable, i.e. archives can be uploaded and deleted but cannot be edited or overwritten.
So what happens if I upload a file/archive, then later, the file changes locally, and the next time I do a backup, how does Glacier deal with this since it can't overwrite the file with a new version?
Deleting Old Data
AWS charges $0.03 per GB to delete archives that are less than 3 months old. Since I am doing a backup of a local server, I want to delete archives that no longer exist locally. What is the best way to organize this. Use the locally stored archive inventory to determine what data doesn't exist anymore and if it's > 3 months old, delete it from Glacier? That seems straightforward but is there a better approach to this?
Individual files vs. TAR/ZIP files
You can upload either individual files as archives or be more efficient by grouping your files into TAR or ZIP files before uploading. The idea of TAR/ZIP files is appealing because it makes it more simple and you incur smaller storage fees, but I'm wondering how I would deal with incremental uploads. If a 20 MB zip file is uploaded that contains 10,000 files, and one of those files is changed locally, do I need to upload another 20 MB zip file? Now I'm required to eat the cost of storing 2 copies of almost everything in those zip files... Also, how would I deal with deleting things in a ZIP file that don't exist locally anymore? Since I don't want to delete the whole zip file, now I'm incurring fees to store files that don't exist anymore.
Maybe I'm overthinking all of this. What are the most straightforward ways to approach these questions?
I don't know if it matters or not, but I'm using the PHP SDK for this backup script. Also I don't want to upload to an S3 bucket first and then backup the bucket to Glacier since I would have to now pay for S3 storage and transfer fees as well.