My experience is that my high-process-count task only succeeded with:
kern.maxproc=2500 # This is as big as I could set it.
kern.maxprocperuid=2048
ulimit -u 2048
The first two can go into /etc/sysctl.conf
and the ulimit value into launchd.conf, for reliable setting.
Since tcp/ip was part of what I was doing, I also needed to bump-up
kern.ipc.somaxconn=8192
from its default 128.
Before I increased the process limits, I was getting "fork" failures, not enough resources. Before I increased kern.ipc.somaxconn, I was getting "broken pipe" errors.
This was while running a fair number (500-4000) of detached processes on my monster Mac, OS 10.5.7, then 10.5.8, now 10.6.1. Under Linux on my bosses' computer it just worked.
I thought the number of processes would be closer to 1000 but it seems that every process I started included its own copy of the shell in addition to the actual item doing the actual work. Very festive.
I wrote a display toy that went something like:
#!/bin/sh
while[ 1 ]
do
n=netstat -an | wc -l
nw=netstat -an | grep WAIT | wc -l
p=ps -ef | wc -l
psh=ps -ef | fgrep sh | wc -l
echo "netstat: $n wait: $nw ps: $p sh: $psh"
sleep 0.5
done
and watched the maximum number of processes in ps -ef and hanging around in netstat waiting for TIME_WAIT
to expire... With the limits raised, I saw 3500+ TIME_WAIT
items at peak.
Before I raised the limits I could 'sneak' up on the failure threshold, which started out below 1K but rose to a high value of 1190.. everytime it was pushed into failure it could take a little more next time, probably because of something cached that expanded to its limit every time it failed.
Although my test case had a "wait" as its final statement there were still PLENTY of detached processes hanging around after it exited.
I got most of the info I used from postings on the internet, but not all of it was accurate. Your milage may vary.