1

Yesterday I set up my first Autoscaling Group in AWS. I wrote a cloud-init/userdata script to install my application and I tested it ~40 times without any errors. Just before I went home it suddenly stopped working, new instances that start never become healthy and are eventually terminated once their grace period expires.

This morning I come in and find that the issue is persisting. I SSH'd into an instance and took a look at the cloud-init-output.log file and found the following:

Err:1 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu xenial InRelease
  Could not connect to ap-southeast-2.ec2.archive.ubuntu.com:80 (54.253.131.141), connection timed out [IP: 54.253.131.141 80]
Err:2 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu xenial-updates InRelease
  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
Err:3 http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu xenial-backports InRelease
  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
Err:4 http://security.ubuntu.com/ubuntu xenial-security InRelease
  Cannot initiate the connection to security.ubuntu.com:80 (2001:67c:1360:8001::21). - connect (101: Network is unreachable) [IP: 2001:67c:1360:8001::21 80]
Reading package lists...
W: Failed to fetch http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu/dists/xenial/InRelease  Could not connect to ap-southeast-2.ec2.archive.ubuntu.com:80 (54.253.131.141), connection timed out [IP: 54.253.131.141 80]
W: Failed to fetch http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
W: Failed to fetch http://ap-southeast-2.ec2.archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease  Unable to connect to ap-southeast-2.ec2.archive.ubuntu.com:http: [IP: 54.253.131.141 80]
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease  Cannot initiate the connection to security.ubuntu.com:80 (2001:67c:1360:8001::21). - connect (101: Network is unreachable) [IP: 2001:67c:1360:8001::21 80]
W: Some index files failed to download. They have been ignored, or old ones used instead.

This is caused by the sudo apt-get update command at the top of my script. Following this, multiple packages in my sudo apt-get -y install command fails to install, which then prevents my application from working.

The weird thing is, if I run sudo apt-get update via SSH it works without any errors, it's only in the cloud-init script that it doesn't work. My hunch is that maybe the instance hasn't yet connected to the network at the time that the script executes? If this is the case, how can I work around this issue?

EDIT: I can no longer reproduce this issue. I've added this to the top of my script to attempt to prevent the issue from re-occurring:

until ping -c1 ap-southeast-2.ec2.archive.ubuntu.com &>/dev/null; do echo "waiting for networking to initialise"; done

But the "waiting for networking to initialise" message isn't present in cloud-init-output.log, so it seems this code isn't doing anything and the issue may have been temporary. If anyone knows what causes this issue and what a more reliable way of mitigating it is, please let me know.

Joshua Walsh
  • 155
  • 10

1 Answers1

1

I figured out what the issue was and I feel a bit silly. It turns out that an instance needs a public IP in order to access servers outside the VPC. I guess I assumed that there would be some kind of NAT allowing the servers to dial out without a public IP, but I see now that if I want that I have to set it up myself with a NAT Gateway.

The reason this issue was hard to troubleshoot is that in order to SSH in and view the logs I was assigning an Elastic IP to the instance, which then caused the script to succeed.

Joshua Walsh
  • 155
  • 10
  • 1
    Anyone using Amazon Linux could possibly use an [S3 endpoint for VPC](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html), since the AWS yum repostitory is hosted on S3. That won't work for you on Ubuntu though. – Tim Jul 21 '17 at 00:01
  • That's good to know, thanks! Does that apply to RHEL as well? – Joshua Walsh Jul 21 '17 at 00:12
  • 1
    I don't think so, just Amazon Linux. I'm not sure it will work, it's a theory. – Tim Jul 21 '17 at 00:44
  • 1
    Instead of a NAT Gateway, you can just run a $5 t2.nano NAT Instance. You'll "only" have a max throughput to the Internet of ~250Mbit/sec, which is lower than the performance of a NAT Gateway, but I find this solution entirely adequate. Or, enable IPv6. Note that VPC has magic handling of DNS, so lookups, even to the Internet, still work with no other Internet access. This trips people up, sometimes. – Michael - sqlbot Jul 21 '17 at 03:25
  • Hmm, that is quite a bit cheaper. I'll consider that, thanks. – Joshua Walsh Jul 23 '17 at 10:09