4

My end users wants to run batch processes concurrently on a Linux box. Their impression is that running multiple of them at same time will reduce turnaround time. These processes do fair amount of heavy lifting.

My take on this is that running concurrently will simply compete for resources and cause contention for resources. Some testing seems to prove my hypothsesis.

I found similar thread below on SF.

  1. Can anybody point to authoritative resources which prove both sides of theory ?

  2. What kind of testing can be done to validate this hypothesis ?

Batch processing on Linux

oradbanj
  • 161
  • 1
  • 1
  • 6

1 Answers1

5

To validate any hypothesis you would simply run the jobs concurrently or in parallel and note the results. If you have multiple cpus you're possibly in a better place depending on how these get executed.

But ask they user why they think that way. You only have theory. Do a bake off test and see what takes less resources and less time.

NB!: I had the same problem but instead of multiple ETLs the jobs were competing w/the backups. Stopping the backups made the ETLs run fast so they blamed the backups!. (But stopping the ETLs made the backups run fast, so be careful how you present your argument!.) Since the DBAs could not fathom they had disk contention, did not want to change or have anything to do other than 'make the backups not affect the ETLs', in the end I wrote a script that polled the ending of the ETLS and only ran once they were done and if done by 7am, otherwise don't run. We had a limited time window.

You'll want to investigate things like process affinity

and whatever tuning you can on the I/O with respect to disk layout / RAID etc, standard perf considerations.

jouell
  • 601
  • 1
  • 5
  • 20