1

I was just looking for some clarification about the niceness value of a process, or perhaps some advice if I'm going about this all wrong.

Say I have a couple dozen web server processes on a single machine, and I'm worried about any given one of them acting up and hogging all the system's resources. Immediately niceness comes to mind for me - I would give them all positive niceness values so they will use any resources if they're available but give them up if there is demand for them. Will this perform as I expect? Or if they all have the same niceness values, would it essentially be the same as them all being 0?

Basically I'm wondering how the kernel decides which processes should get priority, if they all have the same priority value and all need resources? Since there's only 39 possible niceness levels(-20 to +19 iirc), I couldn't give each server different values, if it worked that way.

Any clarification about this would be greatly appreciated.

Some other solutions I've come across to prevent hogging:

  • cpulimit utility (which segfaults for me)
  • ulimit - per user basis, sounds ideal for me as each server has its own user, though it only offers cputime as a measurement of cpu resource usage. I'm curious if this means elapsed time or a percentage of cpu time per second or something else.
  • process control groups - From some initial skimming of google results this looks like a much more complicated solution, but it may have to come to that.
jvnk
  • 123
  • 4
  • Run `top` when it is locked up to see what it is doing--that lets you check for CPU time, memory usage, etc. You might want to continuously log that information to see how it changes over time when it is working fine and when it fails. In addition to the nice values there is also realtime scheduling (really really be cautious about this, a buggy realtime process can really ruin your day) containers (put the buggy stuff in a container so that everyone else can continue to run), etc. – Seth Robertson May 19 '11 at 21:12

1 Answers1

2

First of all, I would suggest that your linux distro probably has things already configured out of the box better than you are going to end up with if you start hacking on things like changing system-wide nice values.

Secondly, I don't think nice is going to do what you want. It has some effect, but not as profound as you might thing. The KIND of scheduler and it's internal ruleset has more effect. Also the kind of problem you are likely to have is with memory usage and not cpu clock time, so nice is not likely to be a cure.

The ulimit system is probably more what you have in mind.

However my primary piece of advice is that you only solve real world problems. Worrying about something before it happens will cause you to patch a system together with duct tape and bailing wire on account of a scarecrow-problem. When you have a real problem, the best solution will be much more apparent. Of course architecting correction and planning for the worst is a good thing, but in this case your linux distro likely has done most of that work for you.

Caleb
  • 11,583
  • 4
  • 35
  • 49
  • Oh, this is definitely a problem we are facing. The machine locks up on occasion and doesn't respond until a restart due to crappy software we are forced to run. We're not sure if it's a CPU or memory usage issue, but in the end we want to limit both things for each process. I will give ulimit a try. Thanks for your advice :) – jvnk May 19 '11 at 19:34
  • One good way to deal with crappy software you are forced to run is run it inside a virtual machine. You can allocate an exact amount of resources to it and it's jailed in a way that it can't mess up your system. Sorry your initial question sounded kind of vaguely theoretical. Definitely figure out exactly how your program is mis-behaving and address it on that level. – Caleb May 19 '11 at 19:38