1

I'm trying to use Memtester as a memory stress and correctness test for my organization's Linux boxes. Memtester basically just takes in an amount of memory to test as an argument, locks that much memory using memlock(), and then runs a number of patterns to verify that the memory is good.

Since I'm trying to verify correctness, I want to be testing as much of the machine's memory as possible. I've been trying to do this by passing it MemFree from /proc/meminfo. Or rather, I have a script that spawns multiple processes, each asking for MemFree (see below), because the OS doesn't allow a single process to lock more than 50% of memory.

The problem is, if I lock more than ~90% of memory, my computer locks up, presumably due to thrashing. About 30 minutes later I'm finally able to use it again.

Is there a way, programmatically or otherwise, to find out how much memory I can lock before it starts swapping?

I want this to run on any Linux box, so anything that requires me to change the system configuration is a no-go. Also, you can assume that the test will be the only thing running on the system (besides normal OS stuff, of course), since the machines are supposed to be left alone while we're stress testing them.


part of the script that spawns memtester processes

    while [ $MEMAVAILABLE -ge 0 ] 
    do
        ./memtester $MEMAVAILABLE'K' &
        sleep 10 #wait for previous process to lock its memory
        MEMFREE=`cat /proc/meminfo | grep "MemFree" | sed 's/MemFree:\(\s\)*\(.*\) kB/\2/g'`
        MEMAVAILABLE=$(($MEMFREE-($MEMTOTAL*5/100)))
    done
    wait

.

  • Is it possible to run these tests form single-user mode? – Matthew Ife Feb 04 '12 at 11:58
  • At the moment it *needs* to be run with sudo on most machines, if that's what you mean. Unprivileged users usually aren't allowed to lock huge chunks of memory. – sapphiremirage Feb 04 '12 at 12:31
  • Well, what I really mean is if the box is running in a predictable state. I.E If your running prefork apache the state of the server is unreliable since a number of inbound connections can try to allocate large quantities of memory. – Matthew Ife Feb 04 '12 at 13:02
  • oh, ok. Testing is done while the server is offline, and the test should be just about the only thing running. – sapphiremirage Feb 04 '12 at 22:34
  • According to the mlock manpage, the 50% mlock ram restriction only applies on 2.4 kernels. You using a 2.4 kernel? – Matthew Ife Feb 05 '12 at 00:20
  • No...I'm on 3.0.x and the other two machines I'm testing on are 2.6.18.x...strange. – sapphiremirage Feb 06 '12 at 01:03

1 Answers1

1

Your script probably allocates too much because of the race condition. In statement1 & statement2, the statement2 can execute earlier, and the loop will continue. And so on.

You cannot allocate more memory? I guess I would try to play with ulimit.

Now to the main point - how much mem. Linux does not work on a DOS-like model, with regard to the "free" memory. The metric MemFree should be really called MemFreeImmediatelyAvailable. You can allocate far, far more than that, and nothing would happen besides maybe a little bit of paging. But if you use memory, it does not mean MemFree would decrease - kernel will convert Inact_clean to MemFree as soon as it can, to maintain a certain minimal size of MemFree (another reason why your script will use too much). A major example of Inact_clean category is usually read cache for the filesystem - kernel can immediately "drop" it as program needs more memory. I am not saying that you can safely eat up all of it, but a large part, yes you can.

  • Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
  • Inact_dirty: Dirty means "might need writing to disk or swap." Takes more work to free. Examples might be files that have not been written to yet. They aren't written to memory too soon in order to keep the I/O down. For instance, if you're writing logs, it might be better to wait until you have a complete log ready before sending it to disk.
  • Inact_clean: Assumed to be easily freeable. The kernel will try to keep some clean stuff around always to have a bit of breathing room.
  • Inact_target: Just a goal metric the kernel uses for making sure there are enough inactive pages around. When exceeded, the kernel will not do work to move pages from active to inactive. A page can also get inactive in a few other ways, e.g. if you do a long sequential I/O, the kernel assumes you're not going to use that memory and makes it inactive preventively. So you can get more inactive pages than the target because the kernel marks some cache as "more likely to be never used" and lets it cheat in the "last used" order.

http://www.redhat.com/advice/tips/meminfo.html

Lastly, I find this a more elegant equivalent:

sed -n '/MemFree/s/.*MemFree:\s*\([0-9]*\) kB.*/\1/gp' /proc/meminfo
kubanczyk
  • 13,502
  • 5
  • 40
  • 55
  • I actually have a 'sleep 10' in between the first and second line of the loop, but I left it out so there would be less code for people to read here. So it's less likely that there's a race condition. And I'm going to be running on arbitrary machines, so I can't assume that I'll be able to play with ulimit. Thanks for the rest of the info though! – sapphiremirage Feb 04 '12 at 12:30
  • Oh. Read on, I mentioned a second possible reason. Also, if you are running the script as root, you can temporarily set ulimit at the beginning of the script. – kubanczyk Feb 04 '12 at 12:33
  • I don't quite understand the second reason...are you suggesting that MemFree will always be above a certain value, even if there isn't that much available memory on the system? Should I be targetting something other than MemFree? – sapphiremirage Feb 04 '12 at 12:47
  • I would point out the memory manager has changed quite a bit since that article was written. Post 2.6.28 the memory manager was changed to a split-LRU model which works somewhat differently. – Matthew Ife Feb 04 '12 at 12:49
  • I'm suggesting that while you can eventually arrive with your script at MemFree=0 it would mean your system just thrashed to death. This is what MemFree=0 means. I would use a different metric. And it seems I need to learn about split-LRU now, dang. – kubanczyk Feb 04 '12 at 12:57