A few days ago a Rails app I've had running on AWS suddenly went down. Upon investigation, it turns out that for whatever reason, it suddenly stopped being able to connect to the RDS database via the domain endpoint.
Following the troubleshooting step here I ran netcat against the endpoint and sure enough, found that the connection times out.
The site also uses SES via its API to send email, and runs into the same problem, with netcat timing out trying to access the email endpoint of email.us-east-1.amazonaws.com
.
As a stop-gap to get the site up, I was able to "solve" this problem by reconfiguring the app to connect directly to the IP of the database endpoint and skip DNS lookup. But the underlying problem still remains, and the solution unfortunately does not work for connecting to SES (it times out whether using the IP or domain endpoint).
The instance and the RDS database are not in a VPC, and as such have no outbound security group rules. The app has been running for years with no similar incident, and I'm certain nothing changed configuration-wise on our end. It just suddenly stopped working.
I initially thought DNS lookup might be the problem, but nslookup
and dig
seem to show no issue.
Can anyone shed some light on what might have happened here? Or what I might do to fix it?
Edit: More info
Experimenting with both SES US email endpoints I found that, from the instance, I can connect to us-west-2, but not to us-east-1 (which is my region). However I can connect to both from my work machine with no problems. Is this a clue? It seems like perhaps the request is failing when AWS wants to route over the internal network (this might not even make sense)? Note this behavior occurs whether using the domain name or public IP of the mail servers.
> nc -zv email.us-west-2.amazonaws.com 443
Connection to email.us-west-2.amazonaws.com 443 port [tcp/https] succeeded!
(Works on instance and home machine)
> nc -zv email.us-east-1.amazonaws.com 443
(Times out when attempting from the AWS instance, but is fine from home)