Using Kubernetes to Run and Balance Batch Jobs

Question

I've an application. It can generate CSV files. It consists of:

A Mysql db.
Message Queue.
Some coordinator services.
A set of worker services.

I do not have access to the source for the application, and cannot change it.

The application was not designed with Kubernetes in mind, but nonetheless it seems like a Kubernetes StatefulSet would happily accommodate it. (This is an assumption I'm making, and I'm open to other options.)

Anyways, what is vexing me is how to setup the following use case:

I want to be able to create a job that will start a StatefulSet, submit some work to it, collect the CSV output, and shut down the StatefulSet afterwords.
And I want some way to limit the number of concurrent StatefulSets to a configurable number that can run on the cluster (for example, maxsets=2 or something).
I'd like jobs submitted in excess of the resource limitations in (2) to queue.

The jobs are completely independent of one another, can be kicked off at any time, and the number of jobs is not known in advance.

I believe that (1), on its own could probably be handled with a Kubernetes 'job' construct.

My Question therefore: What are some ways to deal with (2) and (3)?

NB: The resource usage profile of the application is a bit chaotic, so using basic resource constraints on CPU/MEM or whatever probably won't be sufficient.

Currently, the only idea I have is: 1. Run a pod with an MQ in it. 2. When someone needs to run a batch, they add a message to the MQ (which would contain references to the relevant input data for that batch.) 3. Run another pod with a small always-on script running. That script would continuously poll the MQ, and kick off a kubernetes job anytime something is pulled out of the queue. 4. The script in *3* would only pull and run N jobs at any given time, where N is max concurrent stateful sets that should be able to be run. Not sure if there's a better way though...? — user1848244, Aug 04 '19 at 13:48

score 0 · Answer 1 · answered Aug 05 '19 at 11:57

It's not clear why the job in step 1 would create a StatefulSet to do some operations and will delete it afterwards. That seems to go against the purpose of statefulness:

If an application doesn’t require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application with a controller that provides a set of stateless replicas. Controllers such as Deployment or ReplicaSet may be better suited to your stateless needs.

Unless not stated in the question, there doesn't seem to be any need for an StatefulSet in your scenario, and it looks like a Job is a better fit for this.

Since you're open to other approaches, here's one:

Overall, it seems that you're trying to create resources that are originated from other resources within the cluster, and these resources might not feature some of the constraints that you need (i.e. limit the number of StatefulSets to be run in the cluster).

Instead of spawning new objects from within the cluster, I would suggests talking directly to the API to create whatever is needed from outside the cluster.

This approach has the following advantages:

Is not tied to Kubernetes object's constraints
There is no overhead for having long running objects within the cluster, they're only created when needed
The code is versionable (so it can follow the Infrastructure-as-code paradigm)
Easier to integrate with a CI/CD workflow
If used, the libraries supports multiple languages

This of course requires coding the thing by yourself and a potential downside might be related to the time an objects takes to be created/deployed after any action triggered it and might heavily depend on the size of your images and your image pull policy.

Wrapping up, you can push your application images to whatever repository you're using, and every time a new batch run is needed, talk to the API to create the resources needed so they're created.

This of course doesn't need the application code to be changed in any way.

Using Kubernetes to Run and Balance Batch Jobs

1 Answers1