My team is trying to solve a monitor challenge when it comes to backups.
The backup is running fine. Our current challenge is to monitor these backups so that they actualy do happen.
We can send a mail in case of failure and success. We now want to check for these mails and
- alert if the mail reports a failure
- alert if the success mail wasn't received for let's say a day (to be configured)
This way we are in the known if the backup failed or if the mail could not be send at all. That's is why we also send the success mail, to prove the mail is actually send.
I imagine this idea to be somewhat like a heartbeat that is being actually checked instead of passively waiting for failures.
Which tool can help us?
I suspect this kind of tool allows us to enter expectations that need to happen, for example a mail should be received in the last day, be it success or failure.
The tool would be even better if it could directly go to the disk and check for the presence of the backup files but we would like to support the mail case as well as currently other systems report this way.