I'm looking for some guidance in how to precisely figure out how much RAM my job is using on my cluster. My job is not multi-threaded and runs on a single cpu.
When I run my job and run "top" I can see that it uses this much RAM...
VIRT: 45.6g
RES: 38g
SHR: 9600
which (correct me where I'm wrong) to me means that I'm using 38 Gigs of real RAM, and 7.6 Gigs of stuff that may have been moved to swap. The numbers around 40 Gigs are what the authors of the tool I am testing say my job should be using.
The confusion comes in when I get these numbers from SGE (using qstat or qacct).
qacct -j 7270916
In here I see
mem 2768.453
maxvmem 4.078G
However, neither of these are close to the 45.6 gigs of RAM I know I'm using (even though maxvmem sounds like it really should represent the 45.6 gigs).
While the job was running I tried using this command
qstat -j 7270916
in which I saw the line:
usage 1: cpu=00:01:37, mem=168.12988 GBs, io=38.64676, vmem=1.665G, maxvmem=4.078G
I guess that mem is a sum of all the RAM that was used/released/used/released over the run (it just finished), but maxvmem is still really low (much much less than my expected 45.6 gigs).
So my usage of qcct and qstat both generate numbers that disagree with the expected numbers (which I see with top).
Does anyone out there have suggestions on how to get RAM usage numbers that make sense using SGE commands after the run finishes?
EDIT: I'm using SGE 6.2u5