How to upload a large file using aws commandline when connection may be unreliable?

Question

I've been having trouble uploading a large (800MB) file to s3, using the aws commandline tool. The first attempt completed (after many hours) but was not visible, and I was advised (here) that it has been eaten by goblins and I need to start again.

I did a test with a 16MB file, and it did a 3-part upload, and completed with no problems. I can see it there with aws s3 ls s3://mybucket.

So then I tried with aws s3 cp bigfile.tgz s3://mybucket. But 26 minutes in, I noticed I have got three upload failures, each looking like this:

upload failed: ./bigfile.tgz to s3://mybucket/bigfile.tgz
HTTPSConnectionPool(host='s3-eu-west-1.amazonaws.com', port=443): Max retries exceeded with url: /mybucket/bigfile.tgz?partNumber=8&uploadId=m_jMF.[elided]UPz (Caused by <class 'ConnectionResetError'>: [Errno 104] Connection reset by peer)

Actually the 3rd message says: "Caused by : [Errno 32] Broken pipe)", rather than "Caused by : [Errno 104] Connection reset by peer".

At this point it is still running, and says:

Completed 16 of 120 part(s) with -2 file(s) remaining

This happened before, and I just ignored it, assuming if it was a fatal error it would have stopped. Now I wondering if it is going to spend another 3hrs chugging away and give me an invisible file again, because some parts failed to upload?

If this is the case, my question is: how do I upload a large file to s3 over an internet connection that sometimes has these issues? Is there a way to tell it not to give up so easily?

UPDATE: I tried with the free wifi at a different location, and the file completed quickly, and with none of those failure messages. So, nothing wrong with the file, or my s3 setup. Still hoping to find some configuration option so I can tell it to keep re-trying each part forever.

An ignorable defect, or a fatal one that means I may as well abort the command once I see this? — Darren Cook, Nov 19 '15 at 12:15
@GregAskew Thanks. I've changed my question title to emphasize what I'm asking. — Darren Cook, Nov 19 '15 at 12:19

score 2 · Answer 1 · answered Mar 08 '16 at 14:55

You can use Minio client aka mc, It is open source and compatible with AWS S3.

Minio client has resume session command which will upload the object where it was left in last disconnect.

I have created a Youtube video showing same

Hope it helps. Disclaimer : I work for Minio

score 0 · Answer 2 · answered Mar 31 '22 at 01:20

0

Use the minio-client (mc or mcli) with the --continue flag:

mcli alias set s3 https://s3.amazonaws.com/ ACCESS_KEY SECRET_KEY
mcli cp --continue bigfile.tgz s3/mybucket/bigfile.tgz

answered Mar 31 '22 at 01:20

Gunar Gessner

151
4

How to upload a large file using aws commandline when connection may be unreliable?

2 Answers2