0

I have a website that randomly fail. Is running in open solaris on joyent.

I have a monitoring service that alert me when the site is down, but, I want a way to put a "insider" tool that tell me why that happened.

Is because the cpu is too high? Not memory? Which process fail? Is possible to have a backtrace of that?

Everything is running on the Solaris Service Management Facility. The webserver is cherokee, the database is mysql and the language is python/django.

I want the most simple setup to monitor that & auto-respond , ie: restart the webserver or the django process in case of failure.

I prefer a low-overhead tool. I don't need the fancy monitoring that some tools have, no ned graphs or sms alert. Only know what fail, restart it if possible (maybe up to n times), and have a log somewhere when I will check it.

MDMarra
  • 100,183
  • 32
  • 195
  • 326
mamcx
  • 105
  • 3
  • If everything is running on SMF, as you wrote, you already have the logging, monitoring and restart facilities or am I missing something ? – jlliagre Jan 17 '11 at 00:40
  • Well, any way to see that info? I have not expertise in solaris admin... – mamcx Jan 17 '11 at 17:00

3 Answers3

1

All of your needs can be met by the logs in /var/svc/log.

Those are the logs for everything SMF is doing to your system, behind the scenes.

Extracting the 'interesting' data is left as an exercise for the reader.

1

You might also choose to implement additional monitoring with Nodefly, NewRelic, Pagerduty, Pingdom, or any of nagios, Munin, or zabbix.

You have a lot of choices available.

0

Look into collectd. I've gotten it to compile on illumos/smartos. Also:

https://github.com/gflarity/nervous and https://github.com/gflarity/response

gflarity
  • 206
  • 2
  • 1