8

I'm looking for some guidance in how to precisely figure out how much RAM my job is using on my cluster. My job is not multi-threaded and runs on a single cpu.

When I run my job and run "top" I can see that it uses this much RAM...

VIRT: 45.6g
RES: 38g
SHR: 9600

which (correct me where I'm wrong) to me means that I'm using 38 Gigs of real RAM, and 7.6 Gigs of stuff that may have been moved to swap. The numbers around 40 Gigs are what the authors of the tool I am testing say my job should be using.

The confusion comes in when I get these numbers from SGE (using qstat or qacct). qacct -j 7270916 In here I see

mem 2768.453
maxvmem 4.078G

However, neither of these are close to the 45.6 gigs of RAM I know I'm using (even though maxvmem sounds like it really should represent the 45.6 gigs).

While the job was running I tried using this command qstat -j 7270916 in which I saw the line:

usage 1: cpu=00:01:37, mem=168.12988 GBs, io=38.64676, vmem=1.665G, maxvmem=4.078G

I guess that mem is a sum of all the RAM that was used/released/used/released over the run (it just finished), but maxvmem is still really low (much much less than my expected 45.6 gigs).

So my usage of qcct and qstat both generate numbers that disagree with the expected numbers (which I see with top).

Does anyone out there have suggestions on how to get RAM usage numbers that make sense using SGE commands after the run finishes?

EDIT: I'm using SGE 6.2u5

lonestar21
  • 191
  • 1
  • 2
  • 4
  • Just a note about the mem value: According to http://linux.die.net/man/1/qacct the format of the qacct command is defined by sge_accounting (http://linux.die.net/man/5/sge_accounting), which says that mem is "The integral memory usage in Gbytes cpu seconds". Have you defined a maximum virtual memory limit (s_vmem/h_vmem)? – zpon Jun 23 '14 at 09:08

1 Answers1

4

This is an old question, but if you still have not figured it out you can type

qstat -j <Job_ID>

The category "maxvmem" gives you the maximum amount of RAM your job used when it was running. So the value of maxvmem is the number you are looking for to find the largest amount of memory your job used while it was running.

Also, you can only use qstat while your job is running. If you want to see the memory usage after your job finishes you have to use

qacct -j <Job_ID>.

Hope this helps and below is a link with more information.

http://wiki.genomics.upenn.edu/index.php/HPC:Large_memory_jobs

szimmerman
  • 41
  • 2
  • I don't see how this answer does anything but reiterate the commands that were already clearly stated by OP as being used. And the link is stale... – merv Feb 20 '17 at 09:05