scp 9.2GB file locks up AWS EC2 t2.small instance consistently

Question

I have a 9.2GB file which I want to transfer into my AWS t2.small instance for backup purposes. When I start scp, it copies the file at around 3.4MB per second, which results in about 45 minutes expected transfer time.

Some time down the track the instance always locks up. I cannot type anything anymore in terminal windows, websites stall (its a web server) and I can't connect to it. Rebooting the instance solves the problem.

I investigated EBS limits: I have 2 RAID10 200GB gp2 disks attached. From this documentation I cannot see that I exceed IOPS or throughput for the disk. I also checked bandwidth, but cannot see any information on t2 instances in there. Finally I looked at CPU credits, but presumably it should not completely stall?

This is a one off transfer, so I'm looking to get an idea of how much I have to slow down the transfer to make it happen safely. At the same time I'd like to get an idea of limits for management of this web server.

Look into EBS IOPS credits, you're likely running out of them. You get 5.4M credits to start with at a 256kb block size, which is 30 minutes worth at 3000IOPS according to the documentation, but you're not doing near 3000IOPS unless you're using a tiny block size. But with that storage provisioned you should have enough to still copy quite quickly once they're exhausted, and a reboot wouldn't refresh them. Completely locking up wouldn't happen either. Have you checked disk space? Anything in system logs? What OS? You could set up provisioned IOPS and see if it helps. — Tim, Aug 31 '16 at 09:22
Hi, disk space is at 4%, operating system is Ubuntu 16.04. I calculated 36000 IOPS for my file (9200000/256), so can't see how I run out of them? — jdog, Aug 31 '16 at 17:45
If SCP uses a small IO size it could reduce performance and run you out of IO credits early, an IOP is any packet up to 256kb. However it wouldn't freeze, it'd go slowly, so I don't think it's that. What if you try sftp? Does anything else IO intensive lock up the system? — Tim, Aug 31 '16 at 18:49
Shouldn't be IO credits, but distinctly possible that the CPU credit balance being depleted. Check that in the instance metrics. SCP uses encryption and sometimes compression, both of which could combine to potentially chew through the credits on a t2.small in as little as 5 hours (rough estimate). — Michael - sqlbot, Aug 31 '16 at 23:38
...that shouldn't completely stall the instance, but depending on what's competing for the CPU, it could become extremely sluggish to the point of being unable to service anything. Have you verified the credit balance? — Michael - sqlbot, Aug 31 '16 at 23:55
Try rsync instead of scp. rsync has a bandwidth throtlling option which you can use, and it also will resume any stalled transfer instead of starting over from scratch. Also, you can run `iostat -mx 1` on your ec2 instance while the transfer is going to see what's happening with your disks. it might help give you clues where the bottleneck is happening. — Michael Martinez, Sep 01 '16 at 05:56

user9517 · Accepted Answer · 2016-09-02T21:05:47.597

4

If you want to find out what the problem is then you should install some monitoring or you can also make several connections to the system and run utilities like top, vmstat, iostat, free etc (use watch(1) if needed) to get a view of what is happening to the system resources. Gather data and then apply Scientific Method - it's the only way to be sure.

If you just want to transfer the file then try using split to chunk the file up and transfer each chunk separately. you can then use cat to assemble the chunks back into the whole file again.

edited Sep 02 '16 at 21:05

answered Sep 01 '16 at 05:52

user9517

114,104
20
206
289

I want to +2 this (one for the real answer and one for the workaround), but can't, so I +1'ed your longer post on the scientific method instead. Can't believe I hadn't read that before, and it's so true. Are you familiar with Pirsig's writing on usage of the scientific method in the context of motorcycle maintenance? – MadHatter Sep 01 '16 at 06:51
Xen and the art of System Administration ... our hero waxes lyrical on various philosophical topics whilst navigating a river of shit ;) – user9517 Sep 01 '16 at 10:13
All that needs is a link to the Kindle edition, and my day will be complete ;-) – MadHatter Sep 01 '16 at 10:23

score -3 · Answer 2 · answered Sep 01 '16 at 04:48

-3

One possibility is the file system cache. Typically with large amounts of data copy, file system cache can use up all available memory (t2.small only has 2GB), resulting in swapping, which might cause the system to become unresponsive. Not sure if there is a way to bypass file system cache with scp though.

answered Sep 01 '16 at 04:48

Prakash Manden

30
2

The filesystem cache is designed to use RAM that is not being used for anything else. It is returned to the OS for normal program use in preference to anything else. – user9517 Sep 01 '16 at 05:39

scp 9.2GB file locks up AWS EC2 t2.small instance consistently

2 Answers2