3

I have a server which I want to dedicate for converting doc files to pdf via LibreOffice. The server has 6 cores and LibreOffice is single threaded. This means that generating a single pdf uses 16.666% of my total CPU power. Converting a file can be done from the console and is a blocking task, i.e it waits until done to return control to the console.

I could start 6 headless office instances (on 6 different ports) and via some custom code make sure that each work request goes to a different instance. I'd have to check for when all 6 instances are busy working so I would hold the work in a pending queue buffer. I'd also have to manage timeouts/errors in order to restart the specific instance and attempt to redo the specific work that caused the problem, maybe for 1-2 more attempts until I give up.

The above scenario won't utilize 100% cpu for a single document, but it'll allow converting up to 6 documents at a time, instead of processing them one after the other at only 16.6% power.

My question is: does a product/tool exist to manage such scenario? Probably something generic that could orchestrate such tasks (not knowing any specifics about LibreOffice of course).

cherouvim
  • 744
  • 3
  • 18
  • 37

1 Answers1

5

You might want to check out GNU parallel:

GNU parallel is a shell tool for executing jobs in parallel using one or more computers.

There are plenty of examples in the docs, including GNU Parallel as dir processor which you should probably have a look at.

Of course you will need to do quite some scripting for this, and in the end you may even come to the conclusion that it is easier to do the whole scheduling in your scripts as well.

Oliver
  • 5,883
  • 23
  • 32
  • So make a script that does the converting for a single file. Then use GNU Parallel to parallelize and retry with timeout: --timeout 1000% --retries 4 (timeout if the job takes 10 times the median time). – Ole Tange Aug 12 '15 at 09:18
  • Use {%} to select which instance you want to submit the job to. – Ole Tange Aug 12 '15 at 10:01