0

I suspect that one of our server applications has hit its max open file limit.

The application is running in user-space with its own account. The init-script starts a large number of processes which in turn start a number of sub-processes and a large number of threads.

According to the book I set in /etc/security/limits.conf:

USERNAME - nofile 2048

I suspect that the application has hit the limit - by looking at the temporary file directory I found more than 2000 files there.

After raising the limit to 4096 and doing an application restart I found more than 2100 files there.

Now the question: If the application hit the limit 2048 - why was this not logged in /var/log/messages?

syslog-ng is the current syslog-daemon in use.

/etc/syslog-ng/syslog-ng.conf

options { long_hostnames(off); sync(0); perm(0640); stats(3600); };
source src {
    internal();
    unix-dgram("/dev/log");
    unix-dgram("/var/lib/ntp/dev/log");
};
filter f_iptables   { facility(kern) and match("IN=") and match("OUT="); };
filter f_console    { level(warn) and facility(kern) and not filter(f_iptables)
                      or level(err) and not facility(authpriv); };
filter f_newsnotice { level(notice) and facility(news); };
filter f_newscrit   { level(crit)   and facility(news); };
filter f_newserr    { level(err)    and facility(news); };
filter f_news       { facility(news); };
filter f_mailinfo   { level(info)      and facility(mail); };
filter f_mailwarn   { level(warn)      and facility(mail); };
filter f_mailerr    { level(err, crit) and facility(mail); };
filter f_mail       { facility(mail); };
filter f_cron       { facility(cron); };
filter f_local      { facility(local0, local1, local2, local3,
                           local4, local5, local6, local7); };
filter f_messages   { not facility(news, mail, cron, authpriv, auth) and not filter(f_iptables); };
filter f_warn       { level(warn, err, crit) and not filter(f_iptables); };
filter f_alert      { level(alert); };
filter f_auth       { facility(authpriv, auth); };
destination console  { pipe("/dev/tty10"    group(tty) perm(0620)); };
  log { source(src); filter(f_console); destination(console); };
destination xconsole { pipe("/dev/xconsole" group(tty) perm(0400)); };
  log { source(src); filter(f_console); destination(xconsole); };
destination auth { file("/var/log/auth"); };
  log { source(src); filter(f_auth); destination(auth); };
destination newscrit { file("/var/log/news/news.crit"); };
  log { source(src); filter(f_newscrit); destination(newscrit); };
destination newserr { file("/var/log/news/news.err"); };
  log { source(src); filter(f_newserr); destination(newserr); };
destination newsnotice { file("/var/log/news/news.notice"); };
  log { source(src); filter(f_newsnotice); destination(newserr); };
destination mailinfo { file("/var/log/mail.info"); };
  log { source(src); filter(f_mailinfo); destination(mailinfo); };
destination mailwarn { file("/var/log/mail.warn"); };
   log { source(src); filter(f_mailwarn); destination(mailwarn); };
destination mailerr  { file("/var/log/mail.err" fsync(yes)); };
  log { source(src); filter(f_mailerr);  destination(mailerr); };
destination mail { file("/var/log/mail"); };
  log { source(src); filter(f_mail); destination(mail); };
destination cron { file("/var/log/cron"); };
  log { source(src); filter(f_cron); destination(cron); };
destination localmessages { file("/var/log/localmessages"); };
  log { source(src); filter(f_local); destination(localmessages); };
destination messages { file("/var/log/messages"); };
  log { source(src); filter(f_messages); destination(messages); };
destination firewall { file("/var/log/firewall"); };
  log { source(src); filter(f_iptables); destination(firewall); };
destination warn { file("/var/log/warn" fsync(yes)); };
  log { source(src); filter(f_warn); destination(warn); };
Nils
  • 7,657
  • 3
  • 31
  • 71
  • It's up to the daemon to log when it can't open more files – Petter H Feb 10 '14 at 16:54
  • 1
    because you didn't reach max open files. YOU SUSPECT IT. Moreover, ulimits are reloaded when user log in. You need to re-login. – Marcel Feb 10 '14 at 18:46
  • @Marcel if I am now above 2048 that was a strong suspicion - don`t you think? – Nils Feb 11 '14 at 21:48
  • @PetterH The syslog-daemon runs as root - so what is your statement here? – Nils Feb 11 '14 at 21:49
  • post your `syslog-ng.conf` - probably the correct facility isn't properly declared. – Marcel Feb 12 '14 at 17:35
  • @Marcel I updated my question with the config. – Nils Feb 17 '14 at 13:01
  • 3
    You need to actually know if this is the case. Run your process. Then check `cat /proc//limits` and see what its limit is, following by `lsof -p ` and seeing what the open files count is. Note that each process has its own limits. The open files maximum is not shared with all of the child processes. It is shared between threads of a process though. – Matthew Ife Feb 17 '14 at 17:54
  • @MatthewIfe can you make that an answer? I think you are hitting the mark here. Any ideas how to create threads with a bash-command? – Nils Feb 18 '14 at 15:56
  • Made it an answer with a fuller explanation and some demonstrative way you can see it in action. – Matthew Ife Feb 18 '14 at 21:03

2 Answers2

8

You need to actually know if you are running out of files.

Run your process. Then check cat /proc/<pid>/limits and see what its limits say.

Then, you can get a file descriptor count by running ls -1 /proc/<pid>/fd | wc -l.

Note that each process has its own limits (children of the parent for example). However, threads distinctly share the file descriptor table of the invoking process and as such do share the file limit between threads and the invoking process.

Whilst you cannot create threads in bash, this program can be used to demonstrate the effect.

/* Compiled with gcc -o upcount upcount.c -pthread */
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <unistd.h>

#include <err.h>
#include <sysexits.h>
#include <errno.h>

#include <pthread.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/resource.h>

#define THREADS 3
#define NUMCHILD 3
#define DEF_OPEN_LIMIT 256

/* The gimmick in this program is to constantly dup an FD
 * until we run out of file handles */

void dup_fds(
  int basefd)
{
  int i;
  int *fds = calloc(1048576, sizeof(int));
  char etxt[256];
  int me = pthread_self();

  for (i=0; i < 1048576; i++)
    fds[i] = -1;

  for (i=0; i < 1048576; i++) {
    fds[i] = dup(basefd);
    if (fds[i] < 0) {
      strerror_r(errno, etxt, 256);
      fprintf(stderr, "Cannot dup file: %s\n", etxt);
      return;
    }
    usleep(100000 + (rand_r(&me) % 400000));
  }
}

void * run_thread(
  void *data)
{
  /* This procedure should not be independent */
  struct rlimit ofiles;
  int i;
  i = pthread_self();

  /* Obtain the open files limit */
  if (getrlimit(RLIMIT_NOFILE, &ofiles) < 0) {
    perror("cannot get limits");
    pthread_exit(NULL);
  }

  /* Assign a random value to current limit */
  i = getpid();
  ofiles.rlim_cur = 128 + (rand_r(&i) % 896);

  /* Set the limit */
  if (setrlimit(RLIMIT_NOFILE, &ofiles) < 0) {
    perror("cannot set limits");
    pthread_exit(NULL);
  }


  dup_fds(1);
}


void run_child(
  void)
{
  int i;
  struct rlimit ofiles;
  pthread_t threads[THREADS];

  /* Obtain the open files limit */
  if (getrlimit(RLIMIT_NOFILE, &ofiles) < 0)
    err(EX_OSERR, "Cannot obtain limits");

  /* Assign a random value to current limit */
  i = getpid();
  ofiles.rlim_cur = 128 + (rand_r(&i) % 896);

  /* Set the limit */
  if (setrlimit(RLIMIT_NOFILE, &ofiles) < 0)
    err(EX_OSERR, "Canot set limits");

  /* Create threads */
  for (i=0; i < THREADS; i++) {
    if (pthread_create(&threads[i], NULL, run_thread, NULL))
      err(EX_OSERR, "Cannot spawn thread");
  }

  dup_fds(1);

  for (i=0; i < THREADS; i++)
    if (pthread_join(threads[i], NULL))
      err(EX_OSERR, "Cannot join thread");

  exit(0);
}


int main()
{
  int i, s;
  /* Spawn children */
  for (i=0; i < NUMCHILD; i++) {
    if (fork()) {
      continue;
    }
    run_child();
  }

  for (i=0; i < NUMCHILD; i++) {
    if (wait(&s) < 0)
      warn("wait failed");
  }

  return 0;
}

This program produces 3 children with 3 threads.

$ ./upfilecnt & pstree -p $!
upfilecnt(12662)─┬─upfilecnt(12663)─┬─{upfilecnt}(12666)
                 │                  ├─{upfilecnt}(12667)
                 │                  └─{upfilecnt}(12668)
                 ├─upfilecnt(12664)─┬─{upfilecnt}(12669)
                 │                  ├─{upfilecnt}(12670)
                 │                  └─{upfilecnt}(12671)
                 └─upfilecnt(12665)─┬─{upfilecnt}(12672)
                                    ├─{upfilecnt}(12673)
                                    └─{upfilecnt}(12674)

Each child and thread continually creates a new file descriptor every half second plus some random wait.

You can see that from the child processes, each child has an independent file descriptor table.

$ for i in 1266{3,4,5}; do ls -1 /proc/$i/fd | wc -l; done
637
646
636

However threads of these children all share the same count as the child processes.

# .. another invokation
$ for i in 134{11,14,15,10,12,13,16,17,18}; do ls -1 /proc/$i/fd | wc -l; done
438
438
438
430
430
430
433
433
433

Also note that child pids can have independent limits. This program also sets a random limit on invocation of each child.

$ grep -h "Max open" /proc/1420{3,4,5}/limits
Max open files            504                  4096                 files     
Max open files            502                  4096                 files     
Max open files            372                  4096                 files     

And for added redundant fun, it also sets a random open files limit per thread. But this doesn't stick and is shared between all threads in the process and the child process.

grep -h "Max open" /proc/1420{3,4,5}/task/*/limits 
Max open files            1011                 4096                 files     
Max open files            1011                 4096                 files     
Max open files            1011                 4096                 files     
Max open files            1011                 4096                 files     
Max open files            1009                 4096                 files     
Max open files            1009                 4096                 files     
Max open files            1009                 4096                 files     
Max open files            1009                 4096                 files     
Max open files            750                  4096                 files     
Max open files            750                  4096                 files     
Max open files            750                  4096                 files     
Max open files            750                  4096                 files  
Matthew Ife
  • 22,927
  • 2
  • 54
  • 71
0

You lack this source definition:

  # messages from the kernel
  file("/proc/kmsg" program_override("kernel: "));

Then you're fine!

Marcel
  • 1,575
  • 8
  • 14
  • If this were the case - why are there messages tagged with kernel in /var/log/messages? – Nils Feb 18 '14 at 16:08
  • Anything that uses the command `logger -f kernel 'msg'` would have that tag. And they don't need to be coming FROM kernel itself. You lack `/proc/kmsg` try adding it and watch to see if your program breaks max file descriptor and TEST it if it worked out! ;) – Marcel Feb 18 '14 at 17:41
  • It will test it with the code Matthew provided... – Nils Feb 20 '14 at 10:18