0

A few days ago a Rails app I've had running on AWS suddenly went down. Upon investigation, it turns out that for whatever reason, it suddenly stopped being able to connect to the RDS database via the domain endpoint.

Following the troubleshooting step here I ran netcat against the endpoint and sure enough, found that the connection times out.

The site also uses SES via its API to send email, and runs into the same problem, with netcat timing out trying to access the email endpoint of email.us-east-1.amazonaws.com.

As a stop-gap to get the site up, I was able to "solve" this problem by reconfiguring the app to connect directly to the IP of the database endpoint and skip DNS lookup. But the underlying problem still remains, and the solution unfortunately does not work for connecting to SES (it times out whether using the IP or domain endpoint).

The instance and the RDS database are not in a VPC, and as such have no outbound security group rules. The app has been running for years with no similar incident, and I'm certain nothing changed configuration-wise on our end. It just suddenly stopped working.

I initially thought DNS lookup might be the problem, but nslookup and dig seem to show no issue.

Can anyone shed some light on what might have happened here? Or what I might do to fix it?

Edit: More info

Experimenting with both SES US email endpoints I found that, from the instance, I can connect to us-west-2, but not to us-east-1 (which is my region). However I can connect to both from my work machine with no problems. Is this a clue? It seems like perhaps the request is failing when AWS wants to route over the internal network (this might not even make sense)? Note this behavior occurs whether using the domain name or public IP of the mail servers.

> nc -zv email.us-west-2.amazonaws.com 443
Connection to email.us-west-2.amazonaws.com 443 port [tcp/https] succeeded!
(Works on instance and home machine)
> nc -zv email.us-east-1.amazonaws.com 443
(Times out when attempting from the AWS instance, but is fine from home)
numbers1311407
  • 323
  • 2
  • 10

1 Answers1

0

I suggest digging your DNS entry and make sure the value is set correctly. I doesn't make sense that the connection times out only when using DNS. That would only happen if the DNS record were not being served. So dig yourdb.yourdomain.com and make sure it is actually resolving to your RDS instance. Other than that, did you try restoring your database or make any database change? make sure your RDS is set to be publicly accessible.

  • dig, using Amazon's DNS lookup, is actually finding the internal IP of the database (10.x.x.x), and that's what I can't connect to. If I dig using `8.8.8.8` it comes up with the public IP, which is what I reconfigured the app to use to get the site working again. However digging the email server comes up with a public IP, which also hangs. – numbers1311407 Aug 03 '16 at 19:54