What is a correct monitoring strategy for network services?

Question

Take for example a host which monitored with nagios and check_mk. Now there are http and ssh servers running on it. What is the best monitoring strategy:

Monitor that sshd & apache process are running
Attempt to connect to correct ports e.g. 22 & 80 from monitoring hosts
Attempt to connect to port 80 from external network

I don't want to get loads of alerts:

when host is down (check_mk can't connect for example)
when I have a problem with my network (the services are actually available from other networks)

So I want to know monitoring strategies and theory behind it. Cause I don't just want to have multiple repetitive checks which simply generate loads of non useful alerts. What shall monitoring strive to achieve and how?

I already have nagios deployed with check_mk with more than 500 checks performed. It's a general question on how to plan your checks and achieve good coverage (monitoring solution agnostic if you like).

possible duplicate of [What tool do you use to monitor your servers?](http://serverfault.com/questions/44/what-tool-do-you-use-to-monitor-your-servers) — Shane Madden, Aug 20 '11 at 15:57

score 4 · Answer 1 · answered Aug 20 '11 at 17:07

Pick whatever monitoring solution you want from the above question that Shane linked to. Then while adding all of your hosts and services, make sure to include host/service dependencies. For instance, if hosts A, B, and C are connected to switch D, make sure that A, B, and C are set as dependent on D. That way if switch D goes down, you won't get notifications on all the dependent objects.

There are pros and cons to this, though. In the above example, you'll only get a single alert (as opposed to a flood of alerts), so you need to be very intentional about reading and responding to every single alert, and not depend on the sheer number of alerts to give you a clue on the severity of an issue.

To go along with a flood of alerts, don't let it become an alerting problem http://blog.serverfault.com/post/we-have-an-alerting-problem/ — Nixphoe, Aug 20 '11 at 17:11

score 0 · Answer 2 · answered Aug 20 '11 at 21:15

To check wether the httpd is running correctly you need a different approach: Access an important URL for your applications and check if typical content of that URL is contained in the answer from your webs-server (you might use your own nagios-script using curl for that).

SSHD ist pretty reliable - so there is propably no need to check it. HTTPD will run - but sometimes it will not do anything anymore (this will be covered by a simple port 80 check) - but more often you will have a case where HTTPD runs but the content is not being delivered any more.

Apart from that you should model (network) dependencies. If your proxy is down every httpd check will fail...

There is a nice article about monitoring in the Server Fault Blog...

score 0 · Answer 3 · answered Dec 11 '11 at 20:23

You can mix legacy nagios checks + apache process monitoring for anything that has the "webserver" tag. If you add in service dependencies, you'll get the mix where you monitor "end-to-end" and also achieve getting only one notification (exclude "u" notifications for your contact, of course)

or alternatively, you focus on the "user experience" monitoring, so you would only notify if the simulated end user (check_http) can't reach the site, and use Check_MK BI to drill down onto the issue.

What is a correct monitoring strategy for network services?

3 Answers3