3

I have a RHEL6.2 server that I'm using to run KVM virtual machines.

The server itself has 16 GB of RAM. I want to see the biggest VM I can run on it and not let the qemu-kvm process swap. The VM ram is ~15GB. (Yes, I realize that this is pushing the limit, but read to the end before answering with something like "15GB is too much".)

[root@xxx libvirt]# virsh dumpxml VM2 | grep -i memory
  <memory>15000000</memory>
  <currentMemory>15000000</currentMemory>

[root@xxx libvirt]# ps -ef | grep kvm
root       872     1 16 10:55 ?        00:03:00 /usr/libexec/qemu-kvm [...] -m 14649 -name VM2 [...]


[root@xxx libvirt]# free -k
             total       used       free     shared    buffers     cached
Mem:      16332640   16194440     138200          0       1544      15700
-/+ buffers/cache:   16177196     155444
Swap:     35651568    7583432   28068136

But the RSS of the KVM process is only 880 MB (column 6 below). I expect it to be more like 12-14 GB.

[root@xxx libvirt]# ps -eF | grep kvm
root       872     1 14 4534221 882916 7 10:55 ?       00:03:11 /usr/libexec/qemu-kvm 

And, if I add up the RSS of all processes, its only ~ 1gig.

[root@xxx libvirt]# ps -eF | awk '{print $6}' | grep '[0-9]' | tr '\n' '+' | sed 's/+$/\n/' | bc
1004064

Here are the processes that are the biggest memory users (RSS, column 6).

root      5188 22329  0 27572  1192   4 11:19 pts/0    00:00:00 ps -eF
root     31461     1  0 10746  1236   7 Jul25 ?        00:06:22 [...]
root      6339  6275  0 272676 3288   4 Jul27 ?        00:13:38 [...]
root      2059     1  1 443909 13352  7 Jul17 ?        05:29:11 libvirtd --daemon
root       872     1 13 4534221 928300 2 10:55 ?       00:03:24 /usr/libexec/qemu-kvm [...]

I expect that about 300-500 MB of memory should still be available for the VM.

UPDATE

After rebooting the machine, I cannot reproduce this anymore. The system now works as I expect it to. Here are these expected numbers.

Oh, I should mention. There is a memory intensive program running in the VM. It allocates 80% of the total memory and continually writes random numbers to it.

RSS of the VM:

[root@hb05b15 ~]# ps -eF | grep kvm
root      7330     1 97 4520362 12483728 2 11:59 ?     00:39:55 /usr/libexec/qemu-kvm [...]

Mem and swap numbers:

[root@hb05b15 ~]# free -k
             total       used       free     shared    buffers     cached
Mem:      16332640   13277468    3055172          0      21064     215196
-/+ buffers/cache:   13041208    3291432
Swap:     35651568          0   35651568

Sum up RSS for all processes:

[root@hb05b15 ~]# ps -eF | awk '{print $6}' | grep '[0-9]' | tr '\n' '+' | sed 's/+$/\n/' | bc
12607180
Michael Closson
  • 211
  • 1
  • 4
  • 10

2 Answers2

1

You don't have any RAM left for the operating system, filesystem cache or anything else.

Don't push your VM's RAM so high without leaving adequate headroom.

Now, of course, you could disable swap and see what happens...

ewwhite
  • 194,921
  • 91
  • 434
  • 799
  • Are you sure about that? After adding up the RSS of all processes, there is ~600MB for OS, cache and anything else. I don't think it needs that much. – Michael Closson Jul 31 '12 at 15:58
  • Oops. I meant to say 6GB if unaccounted RAM! Also, the VM is using 80% of its RAM. Due to mem overcommit, its really only a 13GB VM. – Michael Closson Jul 31 '12 at 17:00
0

There's a big difference between swap being used and the machine actively swapping. The kernel will preemptively move more and more onto disk as it sees calls for large amounts of memory. If the stuff moved out to disk isn't that frequently used then it's not necessarily a huge problem. If data is constantly being swapped in and out then there's usually a big problem. The real measure here is to look at something like iostat to observe how much data is actually being passed back and forth in a given interval.

That said, a 15G VM on a 16G box is probably not going to play out well. The OS itself requires a certain amount and there's always a percentage of overhead for the VM. If you are, in fact, actively swapping then you may see substantial improvements just backing off to, say, 12 or 13G.

rnxrx
  • 8,103
  • 3
  • 20
  • 30
  • I can confirm that the process was indeed swapping. top w/ nFLT enabled shows +600K faults for the kvm process. And iostat showed that the swap disk was 100% utilized. You're suggestion about trying w/ 12/13GB is a good one. Unfortunately I rebooted the machine and cannot reproduce it anymore. Also, I have another box that has been up for a while, also doing KVM stress testing. And I cannot reproduce it on that box either. – Michael Closson Jul 31 '12 at 16:37
  • Asking the host OS for 15gb of ram will not automatically make 15gb of ram unavailable to all other processes. (KVM guest processes aren't different to the host kernel than any other process in this respect.) There isn't per se a problem with having VMs who's defined memory collectively sums to more than the physical amount available on the host. (I have half a dozen KVM servers with 24GB RAM each, with VMs totalling twice to three times that on each. The key is that I don't expect them to use it simultaneously. Google for "kvm memory overcommit". – sapeurfaire Jul 31 '12 at 16:48
  • @sapeurfaire, Thanks for the comment. Did you see my update. The system can indeed handle this 15GB VM. The problem is that when the error happened, I can't account for where the RAM is being used. Also, swap was using a huge amount of memory. The numbers are: 16GB ram. Add up RSS of all processes (including the VM): 10GB. So the OS needs to hold on to 6GB! – Michael Closson Jul 31 '12 at 16:56
  • @MichaelClosson, I saw the update after I'd commented :) I'm not sure what to make of the original problem, but wonder if it's recurring? It's also worth looking at this: http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaat%2Fliaattunsetswapiness.htm – sapeurfaire Jul 31 '12 at 17:02
  • How long had the box been up when the issue began? If it had been up for a while then in theory a small memory leak could have accounted for the heavy memory utilization on what otherwise looks like a pretty quiet system. – rnxrx Jul 31 '12 at 18:06