It happens fairly frequently that user cron jobs on shared servers all run at the same time and get caught in contention (near as I've been able to tell). So load explodes, Nagios is angry, Apache stops responding, you can't SSH in because it times out, etc. I'm not in a position where I can just unilaterally decide users can't run crons, but I'd like to combat this issue where pgrep crond|wc -l returns >50.
It seems it should be possible to stagger them out by limiting the number of crond processes running at any given time or similar (like sending SIGSTOP until some of them clear up only less hacky), but I've yet to find any good leads.
The Hardware: 4 CPU and up, Low-end is Dell 1435s with ~8GB memory, RAID 10 WD EADS Mostly Plesk and cPanel, but also some evil Sphera systems.
How do you deal with this problem, sf?