2

We have an mongod instance runnning on a VM, and it doesn't seem to be using all available memory. It's page-faulting significantly more than usual, and the system's performance has been significantly degraded lately.

More specifically, if I htop mongod, I see:

  • VIRT: 3471G
  • RES: 11.8G

The VM has ~60 GB of memory, currently, ~4.6GB is "used", and the remainder is in buffers or cache.

My understanding is that mongod mmaps the database files. (This is why VIRT is huge.) However, we're not clear on why the RES number isn't closer to 60 GB: as mongod needs data off disk, this data should be brought into the processes RSS, no? Mongo reports that it is page-faulting, so one would assume that the RSS would grow over time; ours is holding steady.

There is nothing else significant running on this machine. (It's the database server.) What's consuming the rest of buffers and cache, and specifically, why is the RES size of mongod not expanding to fill available RAM?

Thanatos
  • 356
  • 2
  • 11

1 Answers1

3

This can be a long and involved process, but let me first say this as a starting point. I (and many others I have worked with) have managed to get far closer to maximum resident memory usage. Exactly what that maximum is will vary from system to system and has a lot of variables that come into play but I would generally shoot for 60-80%, anything higher is a bonus.

The next thing to do is some reading. There has been plenty written about this topic, often from the other perspective (better memory efficiency, fitting more into RAM when it is full etc.). For example:

With all that out of the way, you hopefully have a decent idea about how to tune your system to get the most out of the available memory (usually, but not always, knocking readahead down and making sure NUMA is disabled successfully), and are able to see where else memory pressure may be coming from. The next piece to understand is a little trickier, and involves how the MongoDB journal works, and how that in turn interacts with how the kernel tracks the memory usage of individual processes.

This is covered in detail as part of a lengthy MongoDB Jira issue - SERVER-9415. What we discovered when investigating that issue, is that they behavior of the journal when doing a mix of reads and writes could (not always, but it was reproducible) drastically reduce the reported resident memory for the MongoDB process. The mechanics of this have been described in detail by Kristina Chodorow here and there are more details in the Jira issue also.

So, what does all that mean?

It means that the reporting and interpretation of resident memory statistics is complex, particularly on a system that is also doing writes, and especially if that system has memory pressure outside of the mongod process. In general, I recommend the following methodology:

  • Read in (touch or manual pre-heating with a large query/explain) a large, known, amount of data that should fit into memory
  • Run some queries, aggregations etc. on that data set and verify that page faulting is minimal
  • If page faults are low, then the data is fitting into memory, you have a reporting problem. You can repeat the tests with larger data sets until you find your actual limit.
  • If page faults are high, then the data has been evicted, was not fully loaded in etc. and you have something to investigate (readahead, memory pressure, make sure NUMA is disabled etc.)

I generally recommend running MMS Monitoring (free) while testing as that lets you track memory stats as well as non-mapped memory over time, page faults and more, as well as mongostat (for sub one minute resolution) to get a decent picture of what is going on.

Adam C
  • 5,132
  • 2
  • 28
  • 49
  • Thanks for the answer! I'll try to take a look at blockdev RA, that might help. As my post says, we're seeing horribly low RSS in mongo: around 1/6th the total RAM; we do use MMS, and pagefaults is a straight line going up at a rate that's alarming us. (On the memory graph, virtual and mapped are growing linearly, resident is constant.) I *think* NUMA isn't an issue (it's a VM? do VMs have NUMA?); dmesg says "NUMA turned off", and we're not swapping. – Thanatos Feb 24 '14 at 18:36
  • OK, well it could be readahead, but it would have to be way off (very high/inefficient). NUMA is likely not a problem if you are using VMs. Page faults while the set is heating up is OK (after all, you have to have page faults to get the data into active memory) but once you have touched all the relevant data once, they should ease off. Hence my recommendation to pre-heat and then evaluate. If they do ease off, then it would seem you have a reporting issue with resident memory, if not then something else is going on. Do all VMs show the same limitation? – Adam C Feb 24 '14 at 18:42
  • This DB has been up for at least a week. The page faults/s are increasing, but the RSS of Mongo isn't, which is what concerns me. If we were "pre-heating", the RSS of Mongo would be increasing with the page faults. What do you mean by "do all VMs show the same limitation?" (I unfortunately have only one production. The secondary DB servers are calm in terms of page faults / s.) – Thanatos Feb 24 '14 at 21:27
  • If you take a look at the pieces related to under reporting of virtual memory you will note that it is possible for resident memory to be low, and not increase, but for data to be in memory. Pre-heat the data, then run a query, see if there are page faults when you *know* the data is in memory. You will probably need to quiesce the database to be sure about the measurement. If the secondaries are similarly spec'ed then do a failover and see what the behavior looks like then (this is what I meant about them all looking the same or not). – Adam C Feb 25 '14 at 00:44