0

I'm trying to work out why my Monit (https://mmonit.com/monit/) doesn't want to seem to monitor my Solr service. I have it all working for the rest of the services:

enter image description here

But for some reason Solr doesn't want to properly monitor.

I followed the example here:

https://www.webfoobar.com/node/61

For my server I tweaked it a little so the services were correct, and also some of the paths:

## Solr monitoring.

## Test the solr service.
check process solr with pidfile /var/solr/solr-8983.pid
  group solr
  start program = "/etc/init.d/solr start"
  stop  program = "/etc/init.d/solr stop"
  restart program  = "/etc/init.d/solr restart"
  if failed port 8983 then restart
  if 3 restarts within 5 cycles then timeout
  depends on solr_bin
  depends on solr_init

## Test the process binary.
check file solr_bin with path /opt/solr/bin/solr
  group root
  if failed checksum then unmonitor
  if failed permission 755 then unmonitor
  if failed uid root then unmonitor
  if failed gid root then unmonitor

## Test the init scripts.
check file solr_init with path /etc/init.d/solr
  group root
  if failed checksum then unmonitor
  if failed permission 744 then unmonitor
  if failed uid root then unmonitor
  if failed gid root then unmonitor

Checking the Syntax it all looks ok:

  monit -t
/etc/monit/monitrc:295: Include failed -- Success '/etc/monit/conf.d/*'
Control file syntax OK

Any other suggestions as to what I can try?

UPDATE: I really don't understand why this isn't working. All the permissions and files seem to exist, and are set correctly:

root@admin:/etc/init.d# ls -l /var/solr/solr-8983.pid
-rw-rw-r-- 1 solr solr 6 Jul 28 05:41 /var/solr/solr-8983.pid


root@admin:/etc/init.d# ls -l /etc/init.d | grep solr
-rwxr--r-- 1 root root 2711 Jul 25 13:25 solr

root@admin:/etc/init.d# ls -l /opt/solr/bin/ | grep solr
-rwxr-xr-x 1 root root 12694 May 29 22:36 install_solr_service.sh
-rwxr-xr-x 1 root root  1255 Mar  9 20:00 oom_solr.sh
-rwxr-xr-x 1 root root 72389 May 30 00:25 solr
-rwxr-xr-x 1 root root 66010 May 30 00:25 solr.cmd
-rwxr-xr-x 1 root root  6204 May 30 00:25 solr.in.cmd.orig
-rwxr-xr-x 1 root root  6950 May 30 00:25 solr.in.sh.orig

UPDATE 2: When restarting Monit, I get this in monit.log:

[UTC Jul 28 10:22:45] info     : Shutting down Monit HTTP server
[UTC Jul 28 10:22:45] info     : Monit HTTP server stopped
[UTC Jul 28 10:22:45] info     : Monit daemon with pid [26662] stopped
[UTC Jul 28 10:22:45] info     : 'admin.steampunkjunkies.com' Monit 5.16 stopped
[UTC Jul 28 10:22:45] info     : Starting Monit 5.16 daemon with http interface at [213.219.38.44]:2812
[UTC Jul 28 10:22:45] info     : Starting Monit HTTP server at [213.219.38.44]:2812
[UTC Jul 28 10:22:45] info     : Monit HTTP server started
[UTC Jul 28 10:22:45] info     : 'admin.steampunkjunkies.com' Monit 5.16 started
Andrew Newby
  • 1,041
  • 1
  • 22
  • 48

1 Answers1

2

There is an issue with your /opt/solr/bin/solr file leading it to be unmonitor. With the dependency, the process is unmonitored also. Check the permissions, ownership etc of the solr_bin

At some point there was an issue with the solr_bin leading it to be unmonitor and due to dependency, the solr process was unmonitored too. After the checksum was updated with a monit reload or service monit restart , you have to manually monitor the solr_bin and process by requesting via the UI or from the Monit commands. When something is unmonitored, it won't go back to monitor state automatically. You have to take explicitly request it.

DevOps
  • 720
  • 3
  • 15
  • thanks. I have updated my question above with a bit more info. Also, I don't seem to have a solr_bin file/directory on the server? – Andrew Newby Jul 27 '17 at 05:30
  • Yes everything seems fine now for the solr_bin file. Try to "monitor" again the solr_bin and the solr process either using the UI of the command line. Check the Monit log if Monit unmonitor it again. It could have been also a checksum issue due to an update. It that case a Monit reload is need (or maybe a Monit restart) – DevOps Jul 28 '17 at 07:28
  • thanks. How would I "purge" the checksum? I've done a full restart with : `service monit restart` , but that didn't help. – Andrew Newby Jul 28 '17 at 07:53
  • A Monit process reload or a restart should refresh the checksum. What are the Monit log entries when you try to monitor the file ? – DevOps Jul 28 '17 at 09:33
  • mmm ok. I've posted the contents of monit.log in my question, but I can't see anything? – Andrew Newby Jul 28 '17 at 10:24
  • 1
    OK I'll resume. At some point there was an issue with the solr_bin leading it to be "unmonitor" and due to dependency, the solr process was unmonitored too. After the checksum was updated with a Monit reload or Monit restart (or other reason), you have to manually monitor the solr_bin and process by requesting via the UI or from the Monit commands. When something is unmonitored, it won't go back to monitor state automatically. You have to take explicitly request it. – DevOps Jul 28 '17 at 11:53
  • ah ok. So for anyone reading this, the solution was to go to the service page on the Monit web-based panel, and then at the bottom of the page there is a "Monitor" button - click that, and voila :) Thanks for that DevOps - maybe update your answer, and I'll then accept it? – Andrew Newby Jul 28 '17 at 14:14