2

I'm looking for a good open source solution to managing many batch jobs across a cluster of machines. I've looked at the solutions indicated in this post but it doesn't really seem to be what I'm looking for, or perhaps the projects mentioned just have really poor documentation.

We have a good set of batch operations that need to happen on various schedules. These batch operations sometimes have dependencies, as in logs are processed with batch job A, then batch jobs B and C can run on the resulting data. Resource utilization (balancing jobs amongst our batch machines) are probably not as much of an issue, all though would be a nice bonus.

Today we handle this with a combination of fcron and shell scripts. But of course it's rather difficult to keep track of what jobs are scheduled to run on what machines. It's also not always obvious when some job has hung (or is running much longer than expected) or even just fails out right.

This can't be a unique problem for us. In fact we had a home grown solution at a previous company, but was never open sourced. Does anyone have a good solution ?

rhettg
  • 231
  • 1
  • 2
  • 6

4 Answers4

2

There's numerous solutions you may want to take a look at:

Torque - This is a variation of the original PBS (Portable Batch Scheduler) code base. They call it a resource manager because technically it doesn't take care of scheduling jobs although it does include several schedulers. However, it will take care of managing and allocating your compute node CPU, memory, file, and other consumable resources. If you have anything more than very basic scheduling needs, you'll probably want to supplement it with the Maui Cluster Scheduler. I know most about this one because it's what we use. It can be a bit rough around the edges because it's mostly community developed, and most of the developers are sysadmins and not software engineers. There's a commercial product that spawned from the same PBS code base called PBS Professional which seems more mature and is available for a relatively modest fee.

Sun Grid Engine - Similar to the PBS based systems, but written by Sun. The resource manager and scheduler are more integrated in this system and it offers a few different modes of operation and resource allocation. Despite being a Sun product, it apparently runs well on Linux and other operating systems, not just solaris.

Platform LSF - Is another popular commercial offering in the same space.

Condor - Another batch scheduling system more suited for high throughput, tons of short jobs.

SLURM - Is another open source offering. It's not quite as mature as the PBS based products, but it has a nicer architecture that's plugin based and is easy to install if you go with the CAOS NSA Linux distribution and the Perceus cluster manager. See this Linux Magazine article for an example of how easy it is to get up and running.

Which one of these you pick is largely a matter of preference and matching it up with your requirements. I would say that Torque and SGE have a slight bend to multi-user clusters in a scientific computing environment. Based on what I've seen of Altair's PBS Professional, it looks like it is far more suitable for a commercial environment and has a better suite of tools for developing product specific workflows. Same goes for LSF.

SLURM and Condor are probably the easiest to get up and running, and if your requirements are relatively modest, they may be the best fit. However, if you have needs for more complicated scheduling policies and many users submitting jobs to your systems, they may be lacking in that regard without being supplemented by an external scheduler.

Kamil Kisiel
  • 11,946
  • 7
  • 46
  • 68
  • Condor is a great system (disclaimer: Condor pays my salary). Open source, (Apache-derived license), free support. But it's a bit weak on very short running jobs; the overhead starts chewing up a significant chunk of your processing time. For tasks shorter than a few minutes, we recommend grouping them together. The "high throughput" is meant to distinguish from "high performance." Condor wants to eat through massive workloads over a long period, it's not optimized to handle lots of little jobs Right Now. You can do it (many people do), it's just not the default case. – Alan De Smet Sep 03 '10 at 20:33
  • Thanks for the comment Alan, I wasn't aware of that about Condor. Does it support something like GridEngine's binary mode? In that case GE basically just tells the exec host to run a particular command which the host finds locally, no need to upload any kind of script or anything else to the exec host's spool. Makes for comparatively high throughput versus standard batch submission. Used in conjunction with array tasks to remove scheduling overhead, it's well suited even for massive amounts of short running jobs. – Kamil Kisiel Sep 09 '10 at 07:12
1

This might be overblown for your case, but have you checked out the opensource Torque batch scheduler for clusters? It's commonly used in compute grids and large clusters: About Torque.

  • Yeah I've been trying to sort out their documentation. It seems like it's more focused on resource allocation then on scheduling. But maybe it's just that the non-open source add-on schedulers are really what I want and this is just an underlying layer... ? Either way, this project seems.... ancient. – rhettg Oct 29 '09 at 02:46
1

Have you looked into Gearman?

A Gearman powered application consists of three parts: a client, a worker, and a job server. The client is responsible for creating a job to be run and sending it to a job server. The job server will find a suitable worker that can run the job and forwards the job on. The worker performs the work requested by the client and sends a response to the client through the job server. Gearman provides client and worker APIs that your applications call to talk with the Gearman job server (also known as gearmand) so you don't need to deal with networking or mapping of jobs.

http://gearman.org/

Cheers

HTTP500
  • 4,827
  • 4
  • 22
  • 31
  • Gearman looks pretty cool, and would be a good solution to some of our more queue-like problems. But it doesn't really handle scheduling and dependencies of more mundane 'run this script' sort of jobs. – rhettg Oct 29 '09 at 23:59
0

I might not understand the complexity of what you have to achieve, but for cases I work on, it is handled like this. You determine the place to run the cron job based on the data source end. A script run from the machine having the data to process gathers what needs processing, and at the end of that script, uses scp to send it to the next system and ssh to execute the script on the second system. You can execute a remote script by ssh with:

ssh hostname "/usr/local/path/to/script"

The second system does its thing with the data, and at the end of that script, does any further scp of data and ssh on the next system, etc. This way there is a chain of events, and only one cron involved. With one cron job on the source machine, there is no second guessing of whether a second cron can be safely run at a certain time depending on whether the first cron has finished. You just need some .ssh/authorized_keys entries set up and away you go. The results of any failures are reported by the initial cron on the first machine, even if they happen downstream.

labradort
  • 1,169
  • 1
  • 8
  • 20
  • Yeah... that's essentially the solution we have now but starts to become more unmanageable as the number of the number of tasks grows. And you have to write wrapper scripts to do the scheduling for you, and you're dependent on one machine to handle the scheduling, which can fail. – rhettg Oct 30 '09 at 00:02