3

So as the title states, I have a problem where monit won't start on boot. I have a CentOS 7 box that it does start on, and another CentOS 7 box that it doesn't start on, so I know it's not a OS issue and must be a configuration issue somewhere. Both boxes are built with vagrant and are nearly identical. I have no idea where to start.

I'll be watching this question for a while, so please feel free to ask me to clarify anything, I know this isn't much to go on. Any help is appreciated.

EDIT: It's worth noting that I've already tried systemctl enable monit but its already enabled.

EDIT 2: (Irrelevant)

EDIT 3:

[root@stage-web-1 vagrant]# systemctl status monit
monit.service - Pro-active monitoring utility for unix systems
Loaded: loaded (/usr/lib/systemd/system/monit.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2017-06-01 16:37:00 UTC; 6min ago
Process: 1131 ExecStop=/usr/bin/monit quit (code=exited, status=1/FAILURE)
Process: 1079 ExecStart=/usr/bin/monit -I (code=exited, status=1/FAILURE)
Main PID: 1079 (code=exited, status=1/FAILURE)

Jun 01 16:37:00 stage-web-1 systemd[1]: Started Pro-active monitoring utility for unix systems.
Jun 01 16:37:00 stage-web-1 systemd[1]: Starting Pro-active monitoring utility for unix systems...
Jun 01 16:37:00 stage-web-1 monit[1079]: Error opening the log file '/var/www/html/nfs/monit/stage-web-1.log' for writing -- No such file or directory
Jun 01 16:37:00 stage-web-1 systemd[1]: monit.service: main process exited, code=exited, status=1/FAILURE
Jun 01 16:37:00 stage-web-1 monit[1131]: Error opening the log file '/var/www/html/nfs/monit/stage-web-1.log' for writing -- No such file or directory
Jun 01 16:37:00 stage-web-1 systemd[1]: monit.service: control process exited, code=exited status=1
Jun 01 16:37:00 stage-web-1 systemd[1]: Unit monit.service entered failed state.
Jun 01 16:37:00 stage-web-1 systemd[1]: monit.service failed.
Nathan Robb
  • 51
  • 1
  • 6
  • `systemctl enable monit`? – jordanm May 30 '17 at 18:56
  • Ah yes, that was my original idea, however, its's already enabled. I will edit the original post. – Nathan Robb May 30 '17 at 18:57
  • What happens if you use systemctl to stop and then restart it? Do you get any errors? Have you used the monit options that set a startup delay? Do you have another script that is unmonitoring things at startup? – Zoredache May 30 '17 at 18:59
  • I don't know of any scripts that are unmonitoring things, but the monit service itself wont start. `systemctl start monit` starts it just fine, with no errors. I haven't thought of starting with a delay. – Nathan Robb May 30 '17 at 19:05
  • Would you have any idea how I might do about this? – Nathan Robb May 30 '17 at 19:14
  • I would rather find the root cause of the issue though – Nathan Robb May 30 '17 at 19:22
  • Can you paste the monit unit file? – duenni May 30 '17 at 19:30
  • Hm, is this what you're looking for? `check process nginx with pidfile /var/run/nginx/nginx.pid start program = "/usr/bin/systemctl start nginx" stop program = "/usr/bin/systemctl stop nginx" restart program = "/usr/bin/systemctl restart nginx" {% if monit_nfs is defined %} check file blacklist with path {{ app_forex_factory_path }}/nfs/nginx/ip_blacklist.conf if CHANGED SIZE then exec "/usr/bin/systemctl reload nginx" {% endif %}` – Nathan Robb May 30 '17 at 19:43
  • No, there should be a file named `/lib/systemd/system/monit.service` or similar. Paste the content of this file. – duenni May 30 '17 at 19:54
  • I apologize, my mistake: [Unit] Description=Pro-active monitoring utility for unix systems After=network.target [Service] Type=simple ExecStart=/usr/bin/monit -I ExecStop=/usr/bin/monit quit ExecReload=/usr/bin/monit reload [Install] WantedBy=multi-user.target – Nathan Robb May 30 '17 at 19:55
  • Did you execute `systemctl status monit` as root? There seems to be some log lines missing. Also try `journalctl -u monit` or `journalctl -f` after boot. I'll guess this is a timing problem, `monit` probably can't start because a service it depends on is not started yet. – duenni May 31 '17 at 06:16
  • Try to run Monit as _root_ user from the command line using the start command `/usr/bin/monit -I` to check for any relevant output ? Did you also check the content of the log file mostly located at _/var/log/monit.log_ ? – DevOps May 31 '17 at 14:38
  • Thank you @duenni. I ran as root and indeed saw the logs. Its complaining about a log file that doesn't exist (but it does). @DevOps `/usr/bin/monit -I` doesn't render any useful output, and `/var/log/monit.log` doesn't exist, the config file in `/etc/monitrc` points the log to `/var/www/html/nfs/monit/stage-web-1.log`, which again, monit is complaining that it doesn't exist (but it does). – Nathan Robb Jun 01 '17 at 16:51
  • Who is the owner of `/usr/bin/monit` and `/var/www/html/nfs/monit/stage-web-1.log`? Do they differ? – duenni Jun 01 '17 at 18:14
  • @duenni: Thanks to you (primarily), I have found the root cause of my problem! It does seem to be a timing issue after all. The directory /var/www/html/nfs is a shared directory that is mounted on boot. Apparently though, it gets mounted after monit tries to start. To test, I `umount /var/www/html/nfs` and then `systemctl start monit` & `systemctl status monit` which gave the same errors as it does from the boot attempt. When I did `mount -a` & `systemctl start monit`, it ran as expected. YAY, progress! So, any idea how to work around this issue? – Nathan Robb Jun 01 '17 at 19:01
  • Has this share an entry in `/etc/fstab` and is handled by `systemd` too? Or how is it mounted at boot? – duenni Jun 01 '17 at 20:36
  • @duenni, I was able to add in the `/lib/systemd/system/monit.service` file in the `[Unit]` section in `After` the service `nfs.service` so that during boot, monit will always be started after nfs which will make sure the folder that monit tries to log to will exist. If you would like to post the answer with our debugging steps, I'll mark it as correct :) – Nathan Robb Jun 01 '17 at 20:45
  • Oh well, in the end you solved it yourself. :) You can add an answer and accept it yourself, that's fine for me. – duenni Jun 02 '17 at 06:09
  • @duenni Thank you for your help, I'd upvote you if I could haha – Nathan Robb Jun 02 '17 at 14:35

1 Answers1

2

As it turns out, /var/www/html/nfs is a mounted folder to a network drive through NFS. Monit was being started before NFS, so the folder didn't exist yet, causing monit to error with Error opening the log file '/var/www/html/nfs/monit/stage-web-1.log' for writing -- No such file or directory.

The solution was to edit /lib/systemd/system/monit.service:

[Unit]
Description=Pro-active monitoring utility for unix systems
After=network.target

[Service]
Type=simple
ExecStart=/usr/bin/monit -I
ExecStop=/usr/bin/monit quit
ExecReload=/usr/bin/monit reload

[Install]
WantedBy=multi-user.target

and add nfs.service to the After section. The end result looking like:

[Unit]
Description=Pro-active monitoring utility for unix systems
After=network.target nfs.service

[Service]
Type=simple
ExecStart=/usr/bin/monit -I
ExecStop=/usr/bin/monit quit
ExecReload=/usr/bin/monit reload

[Install]
WantedBy=multi-user.target

Monit now starts correctly on boot :)

Thanks to everyone who helped steer me in the right direction.

Nathan Robb
  • 51
  • 1
  • 6