2

I've got a Mac Pro running Mac OS X 10.6.4 Snow Leopard Server and it's recently started getting numerous 'kNetworkError's in Server Admin.app when viewing services. It's acting as a gateway w/NAT and has been so for quite some time.

There is one glaring issue, bootpd crashes all the time with the following errors in `/var/log/system.log/:

Aug 12 16:54:59 servername bootpd[3572]: server starting
Aug 12 16:54:59 servername bootpd[3572]: server name servername.domain.tld
Aug 12 16:54:59 servername bootpd[3572]: interface en0: ip 10.0.1.9 mask 255.255.255.0
Aug 12 16:54:59 servername bootpd[3572]: bsdpd: re-reading configuration
Aug 12 16:54:59 servername bootpd[3572]: bsdpd: shadow file size will be set to 48 megabytes
Aug 12 16:54:59 servername bootpd[3572]: bsdpd: age time 00:15:00
Aug 12 16:54:59 servername bootpd[3572]: [3572] detected buffer overflow
Aug 12 16:54:59 servername com.apple.launchd[1] (com.apple.bootpd[3572]): Job appears to have crashed: Abort trap
Aug 12 16:54:59 servername com.apple.ReportCrash.Root[3571]: 2010-08-12 16:54:59.828 ReportCrash[3571:2807] Saved crash report for bootpd[3572] version ??? (???) to /Library/Logs/DiagnosticReports/bootpd_2010-08-12-165459_localhost.crash

It is correctly configured to serve DHCP through en1 (not en0), the "LAN" port. This happens even with no hardware (even switches) connected to the "LAN" port. There are no DHCP clients listed. Oddly, the "Overview" shows 1 static map, but nothing is listed under "Static Maps" and there are no "Computers" in Open Directory. /var/db/dhcp_leases is empty.

/Library/Logs/DiagnosticReports/bootpd_2010-08-12-165459_localhost.crash is as follows:

Process:         bootpd [3572]
Path:            /usr/libexec/bootpd
Identifier:      bootpd
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  launchd [1]

Date/Time:       2010-08-12 16:54:59.713 -0400
OS Version:      Mac OS X Server 10.6.4 (10F569)
Report Version:  6

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Application Specific Information:
__abort() called

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   libSystem.B.dylib                   0x00007fff803c13d6 __kill + 10
1   libSystem.B.dylib                   0x00007fff80461913 __abort + 103
2   libSystem.B.dylib                   0x00007fff80456157 mach_msg_receive + 0
3   libSystem.B.dylib                   0x00007fff803b92cf __strncpy_chk + 14
4   bootpd                              0x0000000100014e5d PLCache_read + 782
5   bootpd                              0x0000000100004a3d BSDPClients_init + 68
6   bootpd                              0x00000001000053b5 bsdp_init + 2396
7   bootpd                              0x000000010000200b S_update_services + 1228
8   bootpd                              0x0000000100002344 S_server_loop + 571
9   bootpd                              0x0000000100003963 main + 1766
10  bootpd                              0x0000000100000984 start + 52

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x00007fff5fbfe220  rcx: 0x00007fff5fbfe218  rdx: 0x0000000000000000
  rdi: 0x0000000000000df4  rsi: 0x0000000000000006  rbp: 0x00007fff5fbfe240  rsp: 0x00007fff5fbfe218
   r8: 0x0000000000000001   r9: 0x0000000100114280  r10: 0x00007fff803bd412  r11: 0xffffff80002e1680
  r12: 0xffffffffffffffff  r13: 0x00007fff5fbfe330  r14: 0x00007fff5fbfe33b  r15: 0x00007fff7009bec0
  rip: 0x00007fff803c13d6  rfl: 0x0000000000000202  cr2: 0x000000010004c000

Any thoughts or suggestions as to resolving this?

morgant
  • 1,460
  • 6
  • 23
  • 33

3 Answers3

1

Okay, solution found.

I googled 'PLCache_read' (the last function listed in /Library/Logs/DiagnosticReports/bootpd_2010-08-12-165459_localhost.crash as having been run by bootpd before the buffer overflow) and the second hit was in Apple's source for bootpd (bsdpd.c, specificall). BSDP_CLIENTS_FILE() is passed the BSDP_CLIENTS_FILE constant, which, looking at the top of the file, is hardcoded as /var/db/bsdpd_clients.

Checking, /var/db/bsdpd_clients, I found a pseudo-plist containing all the NetBoot clients (remember, NetBoot is built upon bootp) and—sure enough!—the last entry was cut off as follows, leaving the file incomplete:

{
        name=NetBoot060
        identifier=

Stopped bootpd (sudo serveradmin stop dhcp), backed up /var/db/bsdpd_clients & emptied it, then started bootpd (sudo serveradmin start dhcp) and no crashing!

After a reboot, all the other related services (incl. NetBoot) are now back up and Server Admin.app is no longer throwing the 'kNetworkError's.

morgant
  • 1,460
  • 6
  • 23
  • 33
0

Hmm... The crash log shows that bootpd is running a function called PLCache_read which is copying a string and that, somehow, is causing the buffer overflow. (Incidentally, it looks like the source to bootpd is available here.)

My guess is that bootpd is reading a bad config file or getting bad data over the network. I would try running:

sudo fs_usage -w bootpd

and see if that gives any clue as to the source of the problem.

It is clear that someone else had this problem, but, not being registered, I have no idea if they got a useful answer. Moving /etc/bootpd.plist might help.

Ah, you've found an answer while I was typing this. Well, I'll post this answer anyways; perhaps it will be useful to someone else.

Clinton Blackmore
  • 3,510
  • 6
  • 35
  • 61
  • Many thanks for your efforts. If I hadn't figured it out myself, that definitely would've led me to the solution. – morgant Aug 12 '10 at 22:07
0

I just solved the exact same problem with slight differences.

I'm running 10.6.5 client (not server). Same error messages (or as same as I could see).

PLCache_read was also the culprit, except that I had no /var/db/bsdpd_clients file, and creating one didn't solve the problem.

Googling PLCache_read also lead me to apple code, except in this case it was dhcpd.c, which lead me to the hardcoded variable

#define DHCP_LEASES_FILE "/var/db/dhcpd_leases"

and lo and behold /var/db/dhcpd_leases looked to be full of garbage. I moved it to a temporary filename and now internet sharing works just fine.

Morgant, thanks for your in-depth solution. I learned something about how to read crash logs!