Is Amazon Glacier appropriate for archiving Digital Media content?

1

Background : A content production team shoots and records content in digital media formats. These can be a mix of raw footages, converted videos and images.

These content are stored in a share folder (Linux Samba) It is 21 TB storage which is almost fully used. I would prefer having these content team re-organize and clear the data. Overlooking the need for discipline, I am asked to simply archive. It makes sense--as years pile on, the disk space will be thin, no matter how much discipline is maintained.

We had carried out archival using Tape drives under the older leadership. New leadership has discontinued that process. They have recommended archiving older content to Amazon Glacier.

Now, the content size could be around 2Tb as an archive. There may be need to pull out an old content. How frequent?--That we do not know as of now.

No matter how much bandwidth Amazon can offer, the wire I have can do a max of 40 mbit/s. Moreover, I am asked to limit the speed by some means so that others on the same Internet connection are not affected by the transfer.

What are the considerations that I should take in to account to arrive at an understanding on whether Glacier fits the bill for such a task.

Also, is there any BASH command-line tool that can push 2 Tb+ archives to the Glacier Vault ?

Anup Nair

Posted 2015-11-03T13:48:55.153

Reputation: 144

Question was closed 2015-11-04T20:51:07.827

"Therefore, is Glacier an ideal option?" - This is a question seeking our opinion and which isn't on topic here. You do realize you pay to transfer the data both ways right? There are cheaper solutions then Glacier. – Ramhound – 2015-11-03T14:11:10.890

1Also, the negative side to Glacier and other similar “cloud” solutions? You really have no idea how slow a “fast” Internet connection is until you need to transfer tons of large media files to a remote server like this. – JakeGould – 2015-11-04T02:33:06.310

1

I agree with @Ramhound's assessment in light of the use of the word "ideal" in the question. My answer was premised on a looser interpretation of the question as "is Glacier appropriate," which I do see as not quite such a subjective, opinion-based question, and, in context, appears to be the real gist of what's being asked. And, you can load up to 50TB using the "bandwidth" of a FedEx truck using Amazon Snowball.

– Michael - sqlbot – 2015-11-05T12:43:36.940

To expand my initial comment. There are companies that specialize in allowing you to upload and download your archives. You pay a little more but Glacier isn't an ideal solution for data you are going to access, they also don't offer any way to sent you 50 TB of data like these other services do. – Ramhound – 2015-11-05T12:52:10.087

Thank you @Michael-sqlbot. Your answer has given me some direction to study feasibility. It will help in presenting a picture of cost and time. – Anup Nair – 2015-11-18T11:30:45.927

@JakeGould and others who find it opinion driven. I have edited the question. Please review and restore it, if suitable. – Anup Nair – 2015-11-18T11:33:14.857

Answers

5

Glacier is designed and priced for data you don't expect you are likely to need.

Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time.

https://aws.amazon.com/glacier/pricing/

I have several dozen terabytes stored there at the moment, and I highly recommend it -- where appropriate -- so my observations should not be taken as negative, only as emphasizing the point that you need to be sure you understand the product and its intended application.

The native Glacier interface is very low level. It behaves quite a bit like a backup tape or a big tarball . You put an "archive" into a "vault" and it's sort of a black box. You have to maintain records of what you put in each archive, because Glacier can't tell you, any more than physically looking at a backup tape can tell you.

The alternate -- and I would assert -- far better way of using Glacier is through S3. Upload your files into an S3 bucket, and set the bucket's lifecycle policy to archive the files to Glacier after, a few days. With this model, S3 hides the complexity of the raw Glacier API, and the individual files and their metadata remain visible through the S3 console and API. The cost is the same.

Understand, though, that with Glacier (whether through S3 or not) you pay a charge for recovering more than a small amount of data at a time.

Crunch the numbers and you will find that the free allowance for restores is potentially expensive until you have a lot of data stored.

Say I have 180 TB/180000 GB stored. I can only restore 50 GB in any 4 hour window if I don't want to pay additional charges for data retrieval.

180000 × 0.05 ÷ 30 ÷ 6 = 50

180000 GB, 5% monthly allowance, 30 days/no, 6 periods of 4 hours in each day. This works great for me, since my files are typically < 20 GB and it is very rare that I need them. When I do, it's usually for research that isn't pressing so I can spread out the recovery. With a smaller total storage, say 18 TB, my no-charge restoration allowance would be a small 5 GB every 4 hours. So, as I say, consider the restore pricing model carefully.

Possibly a better fit is the relatively new "Infrequent Access" storage class offered by S3. $0.0125/GB/mo is still pretty reasonable and although there is a $0.01/GB charge for downloads, there's no sharp increase in cost if you need to restore a lot of data, and there's no 4 hour wait time, as there is for Glacier restores.

https://aws.amazon.com/blogs/aws/aws-storage-update-new-lower-cost-s3-storage-option-glacier-price-reduction/

Michael - sqlbot

Posted 2015-11-03T13:48:55.153

Reputation: 1 103

2This is a good answer that avoid the subjective nature of the question itself. It brings up what Glacier is used for, provide a fair assessment of the service, and provides a mediation to the restoration dilemma. – Ramhound – 2015-11-05T12:49:27.377

0

I'd start with this first, to get a estimate of what your pricing will be. The base rate is 0.007 dollars/gb/month not including transfer fees.

Then look at how you get your data back from Glacier. Job requests can take several hours and then data is only available for a certain time.

AWS Glacier FAQ

Here is something I found while searching for "glacier data bash."

Example Script for Uploading to Glacier/S3

I use S3 for my client's (over 100) off-site backup. I had looked into glacier as it was cheaper, but the time for data retrieval I couldn't deal with. If one of my sites has a problem, and I need to grab a file from S3 I need it now, not in 4 hours.

N. Greene

Posted 2015-11-03T13:48:55.153

Reputation: 585

Note that the storage rate per gigabyte per month is actually $0.007 -- which is 0.007 dollars per month, not 0.007 cents (the Glacier price is 0.7 cents... 0.007 cents would actually be $0.00007). – Michael - sqlbot – 2015-11-04T18:47:00.527

My bad..edited my answer. Thanks for noticing that. – N. Greene – 2015-11-04T19:04:08.193