1

I have an existing AWS infrastructure which is integrated with DataDog, which is responsible for monitoring various metrics, e.g. SQS queues, ELB, etc.

I'd like to set up a health check for some web sites and APIs. As far as I know, it's possible to do via AWS ELB health checks. However, I'd like to emulate the end-user experience so that the request is sent from the outside world and proceeds via ELB and to the application. Also, not all of the applications currently have ELBs. I've decided to use DataDog's HTTP checks. The question is, should I have a separate EC2 instance just to install the agent on? I certainly don't want to install the agent on the same machine as the Web Site and ping it since it would miss various network issues.

I've also considered Route 53 health checks which would be monitored by DataDog but I don't think it will be fast enough since the communication between DataDog and AWS is usually delayed in comparison to DataDog's agent reporting.


Update: for now I've decided to go with Route 53 Health Checks and CloudWatch alarms. DataDog is responsible for sending notifications when a certain alarm is triggered. As expected, there is some delay between the alarm and DataDog's reaction, but it turned out to be acceptable.

For deeper and more serious analysis I've also considered New Relic and Application Insights. Both of them seem to provide the needed health checks, though New Relic is quite expensive and Application Insights integrate better with Azue.

  • Having a separate instance for health checks would seem to defeat the purpose. I would avoid checks that require an agent, personally, as that's not how customers are seeing it. Just use a regular uptime checker. – Tim Feb 17 '17 at 00:23

1 Answers1

1

I would suggest to use a third party service for this. As we had newrelic in place already we used the included availability monitoring in order to ping some of our API enpoints. It should integrate well with datadog as well. Just for that ping check newrelic is probably too expensive but there are other options available like https://www.host-tracker.com/ which could maxbe get integrated using their API.

dirkaholic
  • 176
  • 1
  • 4
  • Thanks for the answer. I've forgotten to update the question - we've already decided to use Route 53 health checks + CloudWatch alarms, which are in their turn monitored by DataDog. As expected, some delay is present between the failing health check and DataDog reaction, but it's quite OK for now. – Vlad Stryapko Feb 17 '17 at 12:48