Processes on Mac OSX getting 'stuck' and strange CPU usage

4

I work in a scientific institution and one of the tasks I'm currently working on involves running simulations and outputting the data it produces on-the-fly to hard disk. When I say 'on-the-fly', I mean that the program itself spits out data to disk every second or so. These simulations are written purely in single thread C++ and are run on a Mac Pro. The relevant specifications of this Mac are as below:

OSX Version: 10.6.8
Model Name: Mac Pro
Model Identifier: MacPro4,1
Processor Name: Quad-Core Intel Xeon
Processor Speed: 2.66 GHz
Number Of Processors: 2
Total Number Of Cores: 8

The Intel Xeon's are hyperthreaded for up to 8 virtual cores

I execute my simulations in a simple .sh file, using the following syntax:

nohup simulation1.o configfile 2> /dev/null > /dev/null &
nohup simulation2.o configfile 2> /dev/null > /dev/null &

and so on...

Using nohup means that I don't need to worry about random disconnects when I work remotely.

When I look at ps after running, say 10, simulations using my bash file, I find that in the STATE column, the processes switch regularly from 'running' to 'stuck'. Furthermore, I would expect the CPU usage to be 100% for each process, but each process averages about 28% each instead. (My expectation that the CPU usage be 100% for each process is that when I run just one of these simulations, the CPU column is maxed at about 100%, added to the fact that the cores have hyperthreading. These are very CPU heavy simulations.)

Does anyone know what's going on? Specifically:

  • What does 'stuck' mean, in relation to ps
  • Why is the CPU not maxed out for each process?

I'd appreciate some help greatly.

Mani

Posted 2012-06-25T13:04:46.733

Reputation: 190

2I think since data is being written to hard disk on-the-fly, it most likely happens that the process is waiting for the data to be written most of the time and simulating lesser than that. I think you should have a separate thread for writing to disk – Ozair Kafray – 2012-06-25T13:13:36.720

1@OzairKafray: The OS should make writes fully asynchronous if there's enough memory to buffer them. So unless the writing code is broken, threading it is unlikely to help. It's much more likely he's getting blocked on disk reads. – David Schwartz – 2012-06-25T13:58:42.057

@DavidSchwartaz I see, you are most likely right! – Ozair Kafray – 2012-06-25T14:32:42.793

Answers

1

I've fixed my problem. It turned out to be quite subtle, but thanks to Ozair - he hit the nail on the head. My specific simulation doesn't read very much data (only the initialisation parameters) but spends a lot of time spitting out calculated data. The crude way I implemented originally, involving the standard c++ file.open("tobewritten.dat") is very slow, even on its own, but when multiple instances are run, the individual instances spend ages waiting for 'write time' on the hard drive.

There are a few specific lessons I've learnt:

  1. cout << std::endl flushes the buffer on use; if the buffer isn't full, then there is less than optimal use of the faster operation of writing to RAM. Use "\n" and then close the file at the end. c++ handles flushing the buffer to disk when it's full.

  2. When writing multiple massive datafiles (I'm talking GBs) at the same time, it is best to manually specify the buffer. At the moment, I've set the buffer to 50MB. Using big buffers means that your system spends more time between the RAM and CPU and only dumps to disk (the slow bit) at 50MB intervals

  3. Don't even use cout to write to file. It's slower than sprintf and its variants.

Using the methods I've outlined above, I've gone from 28% CPU usage for each process to 100% CPU usage. The 'stuck' STATE doesn't appear anymore.

Mani

Posted 2012-06-25T13:04:46.733

Reputation: 190