Get Python to raise MemoryError instead of eating all my disk space

7

2

If I run a Python program with a memory leak, I would normally expect the program to eventually die with MemoryError. But instead, what happens is that all the virtual memory is used until my disk runs out of space. I am running Mac OS X 10.8 on a retina MacBook Pro. My computer generally has between 10GB to 20GB free. Mac OS X is smart enough to not die completely when the disk runs out of space (rather, it gives me a dialog letting me force quit my GUI programs).

Is there a way to make Python just die when it runs out of real memory, or some reasonable amount of virtual memory? This is what happens on Linux, as far as I can tell. I guess Mac OS X is more generous than Linux with virtual memory (the fact that I have an SSD might be part of this; I don't know just how smart OS X is with this stuff). Maybe there's a way to tell the Mac OS X kernel to never use so much virtual memory that leaves less than, say, 5 GB free on the hard drive?

asmeurer

Posted 2012-11-20T23:56:04.933

Reputation: 450

"Maybe there's a way to tell the Mac OS X kernel to never use so much virtual memory that leaves less than, say, 5 GB free on the hard drive?" Your suggested fix wouldn't help. If you make it run out of memory sooner, the problem will just occur sooner. – David Schwartz – 2012-11-21T00:19:20.607

Oh, yeah, you're right. Duh :) I guess I want just Python to do that, not the whole system. – asmeurer – 2012-11-21T00:30:03.297

Answers

9

Python Level

According to this post, resource.setrlimit() maybe what you need.

Example

#!/usr/bin/python

import resource
import sys
import signal
import time

import os

soft, hard = resource.getrlimit(resource.RLIMIT_STACK)
print 'Soft limit starts as  :', soft

# Use env MY_PY_SET_LIMIT to control limit value
# If MY_PY_SET_LIMIT is not set, RLIMIT_STACK will not change
MY_PY_SET_LIMIT = os.getenv('MY_PY_SET_LIMIT')

if MY_PY_SET_LIMIT != None :
  resource.setrlimit(resource.RLIMIT_STACK, (int(MY_PY_SET_LIMIT), int(MY_PY_SET_LIMIT)))

soft, hard = resource.getrlimit(resource.RLIMIT_STACK)
print 'Soft limit changed to :', soft

TMP = ""

for i in range(10240):
  TMP += "0123456789"
  print len(TMP)

System Level

For Linux, it is actually answer multiple times before on various "board" of stackexchange and other sites too. The best answer I found is here which contain an example.

The answer is use ulimit -v < kByte >. For example, limiting the vm to 10M:

ulimit -v 10240

However, on OS X there are indication (here & here) that ulimit maybe ignore. Those links are very old. I am not sure if situation change in more recent OS X releases.

There is this post for OS X to use launchd conf. It suggest using a Stack section in a plist config

<key>SoftResourceLimits</key>
<dict>
    <key>Stack</key>
    <integer>10000000000</integer>
</dict>

Or with /etc/launchd.conf

launchd.conf

umask 002
limit stack 67104768 67104768
limit maxproc 3400 4500
limit maxfiles 256 unlimited
setenv PATH /opt/local/bin:/opt/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin

PS: according to Mountain Lion man launchd.conf(5) per user launchd.conf is not support

 $HOME/.launchd.conf  Your launchd configuration file (currently unsupported).

John Siu

Posted 2012-11-20T23:56:04.933

Reputation: 4 957

ulimit indeed doesn't seem to work. – asmeurer – 2012-12-15T02:01:06.427

that left us no choice but go with launchd – John Siu – 2012-12-15T02:08:46.660

It's not clear where that should go. It sounds maybe like it should be part of some info.plist in an .app package, but Python doesn't work like that. – asmeurer – 2012-12-15T02:12:48.957

updated answer to clarified – John Siu – 2012-12-15T02:21:44.983

update with a potential Python solution. – John Siu – 2012-12-15T03:12:09.740

Using setrlimit and RLIMIT_STACK, as suggested on the other question, I can get Python to segfault when it uses too much memory. That's a little less ideal than dying with MemoryError, but it could be workable. Of course, I still need to figure out how to make this happen with every run – asmeurer – 2012-12-16T04:04:25.900

If the RLIMIT_STACK code is within the python script, shouldn't they apply each time the script is run? – John Siu – 2012-12-16T04:37:24.607

It's not as simple as having a script. I work on a library (SymPy). The issue comes up whenever I have input that is a little too big, or when testing someone else's code that has such an issue. To be effective, I would need this to be run every time Python starts. I can get there at least for interactive work by adding this to my IPython config file. I also still need to figure out what a good limit actually is. – asmeurer – 2012-12-16T05:30:19.773

Put the limit code into a module and include in your library. So every time your library get import, the limit module get import. The module can be disabled (do nothing) by default, enable by env variable, the limit can be set by env variable too. – John Siu – 2012-12-16T05:39:30.637

Do you happen to know the correspondence between the RLIMIT_STACK and the maximum memory size? My default RLIMIT_STACK is 8388608 (i.e., 2**23). The lowest power of 2 that I can set it to is 8192 (i.e., 2**13), which causes the above script to segfault. Setting it to 2**14 does not result in a segfault. – asmeurer – 2012-12-17T04:06:02.207

RLIMIT_STACK is in byte. 8192 is only 8k. 2^14 = 16k. 2^23 = 8388608 = 8M. – John Siu – 2012-12-17T18:11:37.617

That doesn't tell me the correspondence to actual memory usage. – asmeurer – 2012-12-17T20:29:03.873

I don't know how stack size translate to memory usage. But you can use resource.getrusage(resource.RUSAGE_SELF) in python to check resource usage. – John Siu – 2012-12-17T20:42:57.867