3

We're currently setting up a server to some heavy lifting (ETL) after another process has finished within the business, at the moment we're firing off jobs either via scheduled cron jobs or remote execution (via ssh). Early on this week we hit a issue with too many jobs running side by side on the system which brought all the jobs to a snail pace as they fought for CPU time.

I've been looking for a batch scheduler, a system where we can insert jobs into a run queue and the system will process them one by one. Can anyone advise on a program/system to do this with? Low cost / FOSS would be appreciated due to the shoe-string nature of this project.

Andrew Williams
  • 667
  • 8
  • 20
  • There is a somewhat old but interesting article at http://www.linuxjournal.com/article/4087 – nik Jun 12 '09 at 10:49
  • Yes, a nice article but limited to scheduling on a time basis, as I mentioned in the question we have time scheduled and jobs started at the end of a remote job, which could be any time. We aim to allow only one job to run at a time, and any extra that get triggered remotely or via cron, to go into a queue of jobs to be processed. – Andrew Williams Jun 12 '09 at 10:55

10 Answers10

6

I'd set up some kind of queueing service. A quick Google on "ready to use" stuff shows this:

Depending on your needs you could simply

  • create a wrapper where users submit jobs,
  • the wrapper writes the job to a socket/file/whatever
  • create a consumer that runs job by job waiting for it to finish
  • the consumer is then called regularly by cron (every 5 minutes or so)
    • of course create some locking mechanism so that only n jobs run at a time (where n=>1)
  • if there are no more jobs do nothing
  • if there are more jobs grab the next and wait for it to finish

Actually there's more to it, you could have requirements that implement a priority queue which brings up problems like starving jobs or similiar but it's not that bad to get something up and running quite fast.

If LDP as suggested by womble I'd take that. Having such a system maintained by a larger community is of course better than creating your own bugs for problems others already solved :)

Also the queuing service has the advantage of decoupling the resources from the actual number crunching. By making the jobs available over some network connection you can simply throw hardware at a (possible) scaling problem and have nearly endless scalability.

Martin M.
  • 6,428
  • 2
  • 24
  • 42
5

Two solutions spring to mind:

  1. Use xargs -P to control the maximum parallel processes at one time.
  2. Create a Makefile and spawn with make -j.

They are actually both summarised in this SO thread in more detail.

There is a possibility that these may not be applicable to the structure of your scripting.

Dan Carley
  • 25,189
  • 5
  • 52
  • 70
5

A heavy weight solution to your problem is to use a something like Sun Grid Engine.

Sun Grid Engine (SGE). SGE is a distributed resource management software and it allows the resources within the cluster/machine (cpu time,software, licenses etc) to be utilized effectively.

Here is a small tutorial on how to use SGE.

rkthkr
  • 8,503
  • 26
  • 38
4

You could check out some of the batch-systems used for scheduling jobs on clusters, which has the option to monitor resource usage and declare a system to be too loaded to dispatch more workload to it. You could easily also configure them to only run one job at a time, but for that you may be better off with something less complex than a full fledged batch scheduler (in the spirit of keeping things simple).

As for freely available batch/scheduling systems, the two that springs to mind would be OpenPBS/Torque and SGE.

Edited to add: If you're ever going to add more processing capacity in the future in the form of more boxes, a batch/scheduling system like Torque/OpenPBS/SGE may be good choices as they're basically built to manage compute resources and distribute workloads to them.

Kjetil Joergensen
  • 5,854
  • 1
  • 26
  • 20
3

You can always use lpd -- yeah, old school, but it's really a generalised batch processing control system masquerading as a print server.

womble
  • 95,029
  • 29
  • 173
  • 228
  • Interesting idea, does any documentation exist on using it as a general batch processor? – Andrew Williams Jun 12 '09 at 10:50
  • I haven't tried it with lpd, but I have tried it with lpsched (the old SysV scheduler). There it's simple, as the "printer backends" are all shell scripts (by default). At a very, very previous job, we had a Rayshade "print queue" that rendered jobs and dumped the resulting images in user home directories. – Vatine Jun 12 '09 at 13:31
3

From man batch:

batch executes commands when system load levels permit; in other words, when the load average drops below 1.5, or the value specified in the invocation of atd.

I think this might be what you're looking for. It's part of Debian's at package.

pgs
  • 3,471
  • 18
  • 19
  • Yes, I looked at batch but as you mentioned it executes on load average. This would be a issue due to the inital stages of our scripts which do a large DB extract, this is a low CPU but high network bandwidth task and it doesn't raise the loadavg above 0.3. As part of that criteria another job would be ran at the same time. – Andrew Williams Jun 12 '09 at 11:12
1

wava: a memory-aware scheduler that allows to enqueue batch jobs (submitted with a maximum physical memory usage promise) to be executed when enough physical memory (RSS) is available in the system.

This scheduler has been created originally to enqueue a high number of long-running jobs in machines with a large amount of RAM, and run as most of them concurrently, avoiding memory paging and swapping in order to not penalize the performance of other services running in the system.

idelvall
  • 111
  • 3
1

We used Control M for this exact reason with ETLs and such (but a few years back now). Surely it's not free or open source but it had very good flexibility in terms of batch processing (a la if-this-then-that type of execution flow)

jouell
  • 601
  • 1
  • 5
  • 20
0

A shell script called up by cron could easily do this, it processess it line-by-line.

pauska
  • 19,532
  • 4
  • 55
  • 75
  • We do run jobs via Cron, but the issue we have is that we dont know how long jobs will run for, sometimes it'll be 12 million rows and 4 hours and the next 100k and 15 minutes. – Andrew Williams Jun 12 '09 at 10:51
  • Ohh, sorry. I dint quite understand your scenario then. How about getting the initial process (the one you want to wait for before doing anything else) to write a status file? Application writes "WAIT" into statusfile when it starts up, and writes "OK" into file when its successfully done. Cron jobs starts up batchscript wich goes to exit 0 if file != OK. – pauska Jun 12 '09 at 11:08
0

I would use Torque, which is an updated version of the FOSS OpenPBS.