Problem 1

I want to monitor a headless running LibreOffice-Process with monit version 5.25.1.

Here is my monit config for this approach:

cat /etc/monit/conf.d/libreoffice

check program lo-check-8101 with path "/bin/bash /opt/libreoffice/chkloproc.sh TestLOPort8101 8101"
        with timeout 10 seconds
        if status != 0 then exec "/bin/bash /opt/libreoffice/loproc_is_down.sh"
        if status = 0 then exec "/bin/bash /opt/libreoffice/loproc_is_up.sh"

This LibreOffice Instance is listening on port 8101.

The check-script is returning 0 if everything is ok and 101 if there is an error with that LibreOffice Instance. I'm testing the text conversion of this running LibreOffice Process by sending HTML, requesting TEXT and check the response.

The action-scripts (loproc_is_down.sh / loproc_is_up.sh) are adding / deleting an iptables rule to pronounce the status to a running haproxy, who is port-checking that LibreOffice Instance / Process ... if this sounds a little bit complicated, I'm sorry, but that is not the problem I would like to talk about here.

The problem is, that I don't understand, why monit is logging the following entries:

monit log after restart

[CET Oct 29 16:58:18] info     : Starting Monit 5.25.1 daemon with http interface at [localhost]:2812
[CET Oct 29 16:58:18] info     : Monit start delay set to 10s
[CET Oct 29 16:58:28] info     : 'host1' Monit 5.25.1 started
[CET Oct 29 16:58:58] error    : 'lo-check-8101' status failed (0) -- no output
[CET Oct 29 16:58:58] info     : 'lo-check-8101' exec: '/bin/bash /opt/libreoffice/loproc_is_up.sh'
[CET Oct 29 16:59:28] error    : 'lo-check-8101' status failed (0) -- no output

... and the following status screen from 'monit status':

monit status
Monit 5.25.1 uptime: 0m

Program 'lo-check-8101'
  status                       Status failed
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  last exit value              0
  last output                  -
  data collected               Tue, 29 Oct 2019 16:58:58

System 'host1'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  load average                 [0.03] [0.02] [0.01]
  cpu                          0.6%us 0.6%sy 0.0%wa
  memory usage                 543.9 MB [7.8%]
  swap usage                   0 B [0.0%]
  uptime                       20d 1h 11m
  boot time                    Wed, 09 Oct 2019 16:47:51
  data collected               Tue, 29 Oct 2019 16:58:58

To me it seems, that the check-script is returning exit value 0 but status is reported / interpreted as "Status failed".

I don't understand, why monit is reporting an "error: ... status failed (0)" in its logfile.

What does status mean other than the interpretation of the last exit code of the given check-script programm?

Problem 2

And there is another reaction from monit, which I can't understand, perhaps anybody can explain it to me?

When I try to fake a broken LibreOffice Process by stopping it, monit does recognize this after one cycle and is starting the wanted / configured action-script 'loproc_is_down.sh' and reporting the last exit code correctly as 101, but with the log-line

"info: status succeeded (101)"

for the first cycle and again then with

"error: status failed (101)"

monit log with faked failure

[CET Oct 29 17:14:28] info     : 'lo-check-8101' status succeeded (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
[CET Oct 29 17:14:28] error    : 'lo-check-8101' status failed (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
[CET Oct 29 17:14:28] info     : 'lo-check-8101' exec: '/bin/bash /opt/libreoffice/loproc_is_down.sh'
[CET Oct 29 17:14:58] error    : 'lo-check-8101' status failed (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.
[CET Oct 29 17:15:28] error    : 'lo-check-8101' status failed (101) -- Error: Existing listener not found. Unable start listener by parameters. Aborting.

The opposite is when starting that LibreOffice Process again:

monit log when service is running again

[CET Oct 29 17:15:58] error    : 'lo-check-8101' status failed (0) -- no output
[CET Oct 29 17:15:58] info     : 'lo-check-8101' exec: '/bin/bash /opt/libreoffice/loproc_is_up.sh'
[CET Oct 29 17:15:58] info     : 'lo-check-8101' status succeeded (0) -- no output
[CET Oct 29 17:16:28] error    : 'lo-check-8101' status failed (0) -- no output
[CET Oct 29 17:16:58] error    : 'lo-check-8101' status failed (0) -- no output

Which looks like monit runs that check-script, which is returning exit code 0 and starts the action-script "loproc_is_up.sh" and reports it with "status succeeded (0)"

... but then again is logging "error: status failed (0)" in the following cycles.

I am not understanding the meaning of "status" in the monit concept / documentation ... can somebody explain it to me?

Thank you for reading this long post and hopefully help me with an answer.

Monit is there to catch problems on a monitored entity.

So - line by line - your config tells Monit:

check program lo-check-8101 with path "/bin/bash /opt/libreoffice/chkloproc.sh TestLOPort8101 8101" with timeout 10 seconds

Execute a binary. Store the exit code and some additional info.

        if status != 0 then exec "/bin/bash /opt/libreoffice/loproc_is_down.sh"

A problem occurs if status is not 0. Now execute a binary.

        if status = 0 then exec "/bin/bash /opt/libreoffice/loproc_is_up.sh"

A problem occurs if status is 0. Now execute a binary. - I don't even get what the result of this call should be. Everything's okay here, so why executing something?

So to say: With this config there is not "success" (= everything is fine) case.

To optimize it, you should only catch problems with Monit:

check program lo-check-8101 with path "/opt/libreoffice/chkloproc.sh TestLOPort8101 8101"
    with timeout 10 seconds
    if status != 0 then exec "/opt/libreoffice/loproc_is_down.sh"
    if 2 restarts within 3 cycles then unmonitor

This means nothing is done by Monit if status is 0.

Some more words on the config:

  1. If I get it correctly (see this question), the headless server will create a PID-File. So you might also check with check process and perhaps some send/expect magic to verify the service is running.
  2. If you set your .sh files executable (+x; ie. chmod +x /opt/libreoffice/*.sh) and you have a correct shebang in those files, you can omit /bin/bash in your executes for better readability.

My config on this (not knowing what protocol is used by :8101, assuming http) would be more like this:

check process libre-local with pidfile "/var/run/libreoffice-server.pid"
    start program = "/usr/bin/systemctl start libreoffice-server" # Unit name is an assumption!
    stop program = "/usr/bin/systemctl stop libreoffice-server" # Unit name is an assumption!

    if failed
        port 8101
        protocol http
        request "/any_valid_entrypoint"
        for 2 cycles
    then restart

    # if loadavg (5min) per core > 1 for 5 cycles then restart
    if loadavg (5min) > 4 for 5 cycles then restart
    if totalmem > 2 GB for 5 cycles then restart
    if 3 restarts within 5 cycles then unmonitor

Getting loadavg with per core requires latest Monit-version. So it might not be available in your distro, so I commented out this line ;)

Edit after response from OP (I hope you get notified):

(it's really a pain that we cannot comment < 50 Rep...)

If I get it right, you have to convert something to get the state of the application, if conversion fails the app should be restarted. Translated to Monit:

check program lo-check-8101 with path "CONVERT_HERE"
    with timeout 10 seconds
    if status != 0 then exec "/usr/bin/systemctl restart libreoffice-server"
    if 2 restarts within 3 cycles then unmonitor

... where the CONVERT_HERE executable exits with 0 if converting goes well and <>0 if it fails. I still feel I missed something here. ;)

Could you perhaps drop all three executables to a gist or something?

  • Thanks for your update. Missing thing could be, that a I want to use monit to do something, when status is 0 ... I explained it in my second answer here ... (monit should add iptable rules to REJECT or ACCEPT connections). – Heino Rötten Oct 31 '19 at 16:36
  • But why not including all this into the script that is running with `check program`? If it succeeds you remove the IP tables rule, if it fails you add one. You'll still only get notified if it all fails. But also it will restart itself automagically... (EDIT: The errors you get now are in fact weird...) – boppy Oct 31 '19 at 16:41
  • you are right. I will change the script to do the iptables stuff. I will give a feedback, if the output in the log will behave different. Thx. – Heino Rötten Nov 03 '19 at 00:32
  • I changed the check-script to add / remove a rule to iptables. monit gives a correct summary now. monit logging seems to be correct too. That means, that you were right: monit interprets a config line "IF STATUS = 0 ..." as an error / failure of the check-script, no matter if exit code is 0 or 1. – Heino Rötten Nov 04 '19 at 16:28

@boppy: Thank you for your answer.

You are right, I need to handle "Headless Libre Office" Processes.

LibreOffice is a little bit nasty and is accepting connections although it already hangs up ... so you can only know the health of a running lo process if you are able to convert something (that happens in the check-script).

Because of this I can not rely on PIDs or port-checking ... and try to work around with my check-script + monit to REJECT connections, if conversion is not working.

The idea behind this is:

If monit is adding a rule to iptables to REJECT connections to lo processes, it should remove those added rule, if the lo process is back / healthy and converting again.

Perhaps monit is the wrong tool to do this or I just think to complicated ... but monit looks so much better fitting than using cron to execute those checks and iptables ... perhaps I will give cron a try.

I learned an important thing from your answer, i.e. that there is no success state, if I use "IF STATUS = 0 THEN EXEC ..." lines in the monit config.

So monit does not interpret the exit value 0 as a "success", becaues of this "IF ... EXEC" line.

And thank you for your monit-config ... it seems a good idea to restart lo processes if they are going wild.

But there is still something wrong with monit ... I start monit with debugging turned on by putting "-v" as an option to /etc/defaults/monit an see the following log lines: (monits cycles are configured to be 30 seconds long)

[CET Oct 31 16:49:00] error    : 'lo-check-8101' status failed (0) -- conversion ok
[CET Oct 31 16:49:00] debug    : 'lo-check-8101' status succeeded (0) -- conversion ok
[CET Oct 31 16:49:00] debug    : 'lo-check-8101' program started
[CET Oct 31 16:49:30] error    : 'lo-check-8101' status failed (0) -- conversion ok
[CET Oct 31 16:49:30] debug    : 'lo-check-8101' status succeeded (0) -- conversion ok
[CET Oct 31 16:49:30] debug    : 'lo-check-8101' program started

Is this a monit bug? Perhaps I need a newer version of monit.