3

Background:
I have recently been put in charge of finding the root of and/or fixing a memory leak with an ISC DHCPD (4.2.5-P1) installation on a Debian (Lenny) Unix server.

I have been researching the problem for over a week now and have retrieved a lot of information about the fact that the system is indeed leaking, but I haven't found an actual answer as to why it is or how to stop it.

I have currently:

  • used valgrind in vgdb mode to detect memory leaks and allow for line inspection of the code
  • using valgrind discovered 2 possible start points for the leak
  • compiled the DHCPD source with CFLAGS=-DDEBUG_MEMORY_LEAKAGE_ON_EXIT (this appears to stop memory leaks)
  • run the newly compiled DHCPD binary as dhcpd -6 -d -cf /etc/dhcpd6.conf
  • taken a snapshot of the vsz and rss binaries in 10 minute intervals over the course of 72 hours using the following script

Script:

#!/bin/bash
#probably could have used watch
while [[ 0 -eq 0 ]]; do
    ps -eo vsz,rss,command | grep "dhcpd6.conf" | grep -v grep >> memory-usage.txt
    sleep 600
done

I did a little bit of research on VSZ and RSS. If the RSS size stays the same, but the VSZ size increases it appears that there is a clear memory leak. However, in my situation the VSZ and RSS are both increasing. [Starting size day 1: VZS=8560 RSS=6292 => Ending size day 3: VZS=67168 RSS=64860]

I also looked at /proc/PID/maps to see if I could get any information there, but I was unable to find anything of use.

/proc/PID/maps information:

08048000-081e3000 r-xp 00000000 08:05 119382     /usr/sbin/dhcpd
081e3000-081e8000 rw-p 0019b000 08:05 119382     /usr/sbin/dhcpd
081e8000-08222000 rw-p 081e8000 00:00 0
09fea000-0a11c000 rw-p 09fea000 00:00 0          [heap]
b72b7000-b72c1000 r-xp 00000000 08:01 6184       /lib/i686/cmov/libnss_files-2.7.so
b72c1000-b72c3000 rw-p 00009000 08:01 6184       /lib/i686/cmov/libnss_files-2.7.so
b72c3000-b7673000 rw-p b72c3000 00:00 0
b7673000-b77c8000 r-xp 00000000 08:01 6192       /lib/i686/cmov/libc-2.7.so
b77c8000-b77c9000 r--p 00155000 08:01 6192       /lib/i686/cmov/libc-2.7.so
b77c9000-b77cb000 rw-p 00156000 08:01 6192       /lib/i686/cmov/libc-2.7.so
b77cb000-b77ce000 rw-p b77cb000 00:00 0
b77cf000-b77d0000 rw-p b77cf000 00:00 0
b77d1000-b77d4000 rw-p b77d1000 00:00 0
b77d4000-b77d5000 r-xp b77d4000 00:00 0          [vdso]
b77d5000-b77ef000 r-xp 00000000 08:01 2022       /lib/ld-2.7.so
b77ef000-b77f1000 rw-p 0001a000 08:01 2022       /lib/ld-2.7.so
bfe0d000-bfe30000 rw-p bffdc000 00:00 0          [stack]

Question(s):
1. How should I go about debugging a memory leak like this?
2. ISC says that the solution is resetting the server every so often and that this is not a bug. If my client doesn't want to reset their server is there any middle ground? (They want hard proof that they have to follow through with a solution.)
3. Has anyone had experience with a dhcpd related leak since January, 2013?
4. Is there a solution or workaround available for this problem?

Related Link(s):
1. https://kb.isc.org/article/AA-00737 (ISC Report)
2. https://access.redhat.com/site/solutions/402713 (This bug report matches the start point of my memory leak [OMAPI FUNCTIONALITY])

If you need any additional information that may help in solving this problem, I ready to provide what I can.

In the mean time I am going to see if I can compile the binary and disable OMAPI functionality.

DHCP 4.3.0a1 was just released, so I'm going to see whether this changes anything (there was no information on the change log about a bug leak fix, but it doesn't hurt to try).

Thanks for your time.

Dodzi Dzakuma
  • 169
  • 1
  • 8
  • We really need to know what you found in `/proc/PID/maps`. That *has* to tell you whether the memory is an anonymous mapping, a shared mapping, or a private, non-anonymous mapping. – David Schwartz Dec 16 '13 at 06:58
  • I just started another test that's going to run for the 2 days. I've all ready killed the process that I used to post this question. If you tell me what information you need I'll make sure to get that to you before I kill the current process. Please just give me a list of the directories or files that you need. – Dodzi Dzakuma Dec 16 '13 at 07:34
  • I have just updated the question with some relevant information. If you need anything else please let me know. Also, I am currently running a new binary compiled with `--disable-use-omapi`. I don't know if it will change anything, but it doesn't hurt to try. – Dodzi Dzakuma Dec 16 '13 at 08:01
  • @DavidSchwartz I just finished another memory test on dhcpd. I compiled it with a `--disable-use-omapi` flag. It did nothing. It hasn't been documented, but I gave it a try anyway. Is there any other information you can give me, or any other suggestions as to a way to find a solution to this problem? I am currently reading their configuration script to see if I can hunt down a solution there. – Dodzi Dzakuma Dec 18 '13 at 01:21
  • Would serverfault be better suited for this question? – Dodzi Dzakuma Dec 18 '13 at 04:13

1 Answers1

1

As a workaround, you might consider running dhcpd with memory limits under a process supervisor like runit.

Hopefully dhcpd will abort if it fails to allocate memory, at which point the process supervisor will restart it.

Or you can just restart it periodically from cron -- it's still less invasive than rebooting the entire server.

András Korn
  • 641
  • 5
  • 13