How to determine which resource is exhausted

Question

On a server with a very high load many of my daily cron jobs stopped working. I have postfix server running that only delivers locally so that I can see the output of the cron jobs with mutt.

I grepped for cron in the logs and I saw this:

Feb 23 22:44:16 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable

In /var/log/mail I see this:

Feb 23 22:05:15 server10 postfix/sendmail[1113]: warning: fork: Resource temporarily unavailable

A systemctl status cron and systemctl status postfix shows that both processes are running.

So I added this cronjob that runs every minute

#!/bin/bash

date
sleep 1
date

date >> ~/cron.log

echo "bye"

And it took almost 5 minutes for ~/cron.log file to appear. And then I can see that not every minute is being executed, which explains why my daily cron jobs are not being executed.

$ cat cron.log
Tue Feb 23 22:52:02 CET 2021
Tue Feb 23 22:53:02 CET 2021
Tue Feb 23 22:56:02 CET 2021
Tue Feb 23 22:58:02 CET 2021
Tue Feb 23 23:01:02 CET 2021
Tue Feb 23 23:02:02 CET 2021
Tue Feb 23 23:07:02 CET 2021
Tue Feb 23 23:08:02 CET 2021
Tue Feb 23 23:10:03 CET 2021
Tue Feb 23 23:11:02 CET 2021
Tue Feb 23 23:13:02 CET 2021

So when I run a tail -f on /var/log/messages I see this:

$ tail -f messages | grep cron
Feb 23 23:17:03 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable
Feb 23 23:18:01 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable
Feb 23 23:18:01 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable
Feb 23 23:18:01 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable
Feb 23 23:19:16 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable
Feb 23 23:19:16 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable
Feb 23 23:20:02 server10 cron[2276]: /usr/sbin/sendmail: Resource temporarily unavailable

So I googled for that and found Cron jobs not working anymore which has a very similar issue, but that didn't help me. I don't know which resource is hitting the limit. sysctl kernel.pid_max shows 32768, which seems kind of low for an x86_64 system, so I raised the value to 4194303 but that didn't help either, the Resource temporarily unavailable messages keep appearing.

So how can I determine which resource is hitting the limit? Sadly the log files don't tell me that much.

"resource temporarily not available" refers to system error EAGAIN. `fork` fails with this error under several circumstances (see the manual page), among them an overflow of the process table. Try increasing `nproc`. — berndbausch, Feb 24 '21 at 01:05
@berndbausch I increased `nproc` and restarted postfix and cron. Now I don't see the `Resource temporarily unavailable` message, but my test cronjob that should run every minute is still being skipped very often. — Pablo, Feb 24 '21 at 09:36
It may be that a one minute frequency is more than cron can handle. You need your cronjob once a day - that should not be a problem. To test this, try something more realistic like 5 minutes. — berndbausch, Feb 24 '21 at 09:43
@berndbausch do you really think one minute frequency is too much for cron? That seems strange, even on my desktop cron handles every every-minute-cronjob just fine. I'll try that to see if that's really a problem. — Pablo, Feb 24 '21 at 10:12
@berndbausch I changed the frequency to 5 minutes and I see the same behaviour, sometime it doesn't get executed on time. I installed the same cronjob example (1 minute frequency) on my desktop and on another server, and there they really are executed every minute. So here I have more problems though, because after increasing `nproc` I don't even see any error message anymore, so I have even less information than before. — Pablo, Feb 24 '21 at 10:47
You mention heavy load on your server, which made me think cron might have problems running once a minute. I suggest you ask a different question, since the problem stated by this question (resource exhausted) has been solved. — berndbausch, Feb 24 '21 at 10:57
@berndbausch yes it seems that this is just a sign of the problem that lies somewhere else. I noticed on the "good" server that `systemctl status cron.service` just shows one running process that consumes 400M of memory. On my bad server, the same command shows a list of 1000 processes (all from the same user) and consumes 22G of memory. I think this might explains why I'm running out of resources, I need to investigate why these processes seem to linger after being executed. — Pablo, Feb 24 '21 at 11:13

How to determine which resource is exhausted

0 Answers0