14

I have a server with a faulty power button that likes to reboot itself. Usually there are warning signs, like the acpid log file in /var/log starts spamming garbage for about 10hrs or so.

Is there an easy way I can have something monitor the acpid log and email me when it has new activity?

I wouldn't consider myself extremely advanced so any "guides" you may have for accomplishing something like this would be very helpful and much appreciated. Thank you!

chmeee
  • 7,270
  • 3
  • 29
  • 43
Physikal
  • 570
  • 2
  • 9
  • 22

8 Answers8

19

You could use something like LogWatch. Or even a simple script like this (it's pseudo code you'll need to modify it for your enviroment):

 #!/bin/bash
 GREP_STRING=`grep -c <error string> <acpid log location>`
 if [ $GREP_STRING -ne 0 ] 
 then
    <send email notification>
 fi

Put that in cron to run every hour or so and you should get an email letting you know when it's getting wierd.

Mark Lopez
  • 103
  • 4
Zypher
  • 36,995
  • 5
  • 52
  • 95
7

You can use OSSEC HIDS to set up rules on log files and, at the same time, get security information from your host.

Setting it up is very easy:

  • Download the source
  • Uncompress it and run ./install.sh
  • Choose local install
  • Answer the questions (email, checks, etc.)
  • Edit /var/ossec/rules/local_rules.xml as specified below
  • Start OSSEC with /var/ossec/bin/ossec-control start

local_rules.xml

<group name="local,syslog,">
  <rule id="100001" level="13">
    <regex>^.*Your string.*$</regex>
    <description>I've just picked up a fault in the AE35 unit. It's going to go 100% failure in 72 hours</description>
  </rule>
</group>

Rules can be very flexible and complex. See this table to get an idea of the parameters involved in a rule.

If you don't want or need the other security features you can deactive them by removing the include lines under the rules tag.

chmeee
  • 7,270
  • 3
  • 29
  • 43
5

I would suggest Nagios its what we run where I work for monitoring multiple machines with are network. Its very good i've not used it specifically for what your doing but you can certainly set it up to email you when errors occur.

There is a guide here for installing it on Ubuntu http://beginlinux.com/blog/2008/11/install-nagios-3-on-ubuntu-810/ and one here for installing on http://www.debianhelp.co.uk/nagiosinstall.htm.

Mark Davidson
  • 395
  • 4
  • 11
3

And you can send it with something like this:

EMAILMSG="/tmp/logreport.$$"
echo "Something to put in the email" >> $EMAILMSG

cat $EMAILMSG | mail -s "Whatever Subject You Like" user@domain.com
rm -f $EMAILMGS
Shoe
  • 31
  • 2
3

Download and install Splunk on the server. It's similar to logwatch, but provides you with a search engine for your logs.

You can configure it to index your logs, you can then search the logs and find patterns, find the errors, and then look at what other logs are doing at that specific point of failure.

It can also be set to send alerts or execute scripts at certain thresholds. So if a particular error starts being spammed to your log, you can script it to automatically restart the offending service.

We use splunk in our server cluster and it has been a lifesaver!

cheffe
  • 103
  • 3
Amish Geek
  • 86
  • 1
  • 4
3

I'm using Zabbix with IPMI tools to restart faulty servers on demand. Also, I think OSSEC is a good choice too, but you really need to experiment and debug before put it in prod...

edomaur
  • 387
  • 1
  • 5
  • 12
1

At a previous employer, we used logsurfer+ to monitor logs in real time and send email alerts. It does take a lot of time and configuration to tune for false positives, but we had a ruleset that worked quite well for a variety of findings and alerting, far more valuable than Nagios was for similar purposes.

Unforunately I don't have access to the config file anymore to provide samples of what we filtered, but the site should provide more information and examples.

jtimberman
  • 7,511
  • 2
  • 33
  • 42
0

You can also take a look at my Octopussy project.

sebthebert
  • 1,224
  • 8
  • 21