I just started using Nagios and I like that my team can acknowledge problems, but I haven't yet found a way to log the solutions that are used to correct the problems. Is there a tool that logs Nagios alerts and provides a way to complete post-mortems and log solutions so that when someone encounters similar problems, they can reference the logged data?
3 Answers
Honestly, I don't think trying to capture this information at fault time is useful. You're stressed, possibly still sleepy, at the very least you'll be in a "fight or flight" mode that isn't conducive to writing good documentation. Nagios already has the ability to record quick notes in the service (either as part of the ack, or as a separate note you attach to the service/host); these could be used as part of the post-mortem you should be doing at leisure after the emergency, and then incorporated into a more structured, useful, and better-written piece of documentation that's captured in a wiki and linked to from the service itself in Nagios (via the notes_url
field).
- 95,029
- 29
- 173
- 228
-
I agree with your statement about doing post-mortems the next day and I like the idea of using the notes_url field to tie a service to another URL for extra info. Do you have any recommendations for tools that handle this? I don't think a Wiki is the right choice as I need tools for developer discussions and eventual resolutions. A basic forum might work. – GregB Aug 22 '11 at 21:36
-
You don't want to link to discussions from a `notes_url` -- you want to link to **the right answer**. That is best recorded in a wiki. That discussions may occur in whatever way you feel is appropriate is orthogonal to the information you want available when you're trying to fix a problem. – womble Aug 22 '11 at 22:53
Take a look at event handlers. All you have to do is write a script to handle event and log your solution into a issue tracking system (I like Redmine).
- 50,327
- 19
- 152
- 213
Where I work we do it the other way around.
We use a ticketing-system called 'TopDesk' (doesn't really matter). Whenever there is an alert in Icinga (nagios-fork), this creates a ticket via an HTTP-request to the TopDesk-server.
So, I think it's easier to let nagios send out warnings/errors via mail, sms and a ticketing system then using it to keep track of the actions taken.
- 17,761
- 6
- 62
- 81