3

I have a client uploading multiple TB of data to Glacier. They did a snowball that got 65 TB of data, and they are going to do the remaining ~25 via upload. Currently they are uploading directly to Glacier with FastGlacier, but that tool is running on their sole Windows machine (a full Mac shop) and is constantly crashing from queuing so much data. Additionally, this program leave a lot to be desired in regards to searching/browsing the store, as in order to view files in the Glacier you need to download the inventory (with the 4-6 hour lead time).

For consistency, we'd like to upload to the S3 share that we used for the Snowball, with the same 0 day transition to Glacier using a lifecycle management rule, but don't want to incur massive costs of S3 for it. I know S3 costs is based on average usage over the month, but not sure how to estimate this.

  • 3
    Be sure you thoroughly understand that "Glacier" != "the S3 Glacier Storage Class." The two are very different. I would not use Glacier itself without a rock solid storage management and indexing system in front of it. Glacier is raw, low-level, expert mode, and almost completely opaque. The S3 Glacier storage class is entirely different and much more usable, unless you need something accessible only by using Glacier itself, like vault locks. Importantly, as documented, you cannot access data loaded into one service via the other service. – Michael - sqlbot Apr 01 '17 at 02:00

1 Answers1

5

The AWS storage services overview whitepaper (two links) says "You can specify an absolute or relative time period (including 0 days) after which the specified Amazon S3 objects should be transitioned to Amazon Glacier".

S3 lifecycle rules say you can't transition S3 data to Infrequently access storage class until 30 days after upload. However you can transition to glacier immediately - "0 days" appears to be a valid setting.

I tried this myself. I created a new bucket with a lifecycle rule to transition to glacier after 0 days. I uploaded a small file using S3 standard class. The file changed to the glacier storage class between 5 and 8 hours after it was uploaded. I can't say more precisely because I don't see any logs about this, and I only checked occasionally.

You could consider using a storage gateway, but that relies on running a virtual machine on premises. It stores data in S3 so you'd have to transition it using lifecycle rules. An upload client may be easier, given the time that would take.

There are Glacier clients that run on mac, such as Freeze, Glacier Uploader, and others.

Tim
  • 30,383
  • 6
  • 47
  • 77
  • I don't see where it says you can't transition to Glacier before 30 days. > "Objects must be stored at least 30 days in the current storage class before you can transition them to STANDARD_IA. For example, you cannot create a lifecycle rule to transition objects to the STANDARD_IA storage class one day after creation." So it looks like that rule is for transfering to IA rather than Glacier, unless I missed something. – Michael Lubert Apr 03 '17 at 16:44
  • You're right, and I just tried it to be sure. The user interface let me transition to glacier after 0 days. I hope that 0 doesn't represent "never". The API suggests that 0 is a valid transition time http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html – Tim Apr 03 '17 at 19:05
  • That's what I did as well (though I set it to 1 day as I was also paranoid that 0 was never). I just checked their invoice and with a 1 day transition for ~68 TB of data it looks like it counted it as 2.3 TB-month, so it looks like it didn't prorate fully, so I think this is something I'm going to pitch and have them move to S3->Glacier. If you find out that 0=immediate, that would be even better! – Michael Lubert Apr 04 '17 at 19:32
  • I tried it out, see my update above (paragraph two). – Tim Apr 05 '17 at 04:26
  • It looks like the transition time needs to be a "positive integer", which would imply that 1 day is the minimum time you can set. https://docs.aws.amazon.com/AmazonS3/latest/API/API_Transition.html – dmohr Feb 05 '20 at 01:13
  • I read in the life cycle documentation that said that the work is done at midnight UTC, in case that info is useful to anyone. The document that dmohr linked also says it: "The time is always midnight UTC." – AndyD273 Dec 13 '20 at 06:59