Storing data in the page/swap file (pagefile.sys
on Windows) means storing it in virtual memory. If that's really what you want, then you're already doing it whenever you allocate an array in the usual way.
Virtual RAM, like physical RAM, doesn't survive a reboot. There's no way to store data permanently in the page file. It could technically be done because it is a file on a persistent medium, but it just isn't meant for that. Its purpose is to simulate physical RAM.
It sounds like what you really want is to store your numpy array not in the page file, but in an ordinary disk file – the opposite of your title.
I've never done this, but according to the documentation you linked,
An alternative to using this subclass is to create the mmap object yourself, then create an ndarray with ndarray.__new__ directly, passing the object created in its ‘buffer=’ parameter.
which means that you ought be to able to create the array data like this:
file = open('backing_file', 'xb')
mapped_data = mmap.mmap(file.fileno(), 123456 * 4, access=mmap.ACCESS_WRITE)
array = np.ndarray.__new__(shape=(123456,), buffer=mapped_data, dtype='float32')
# fill in the array
and then, on a subsequent run, map the array into memory like this:
file = open('backing_file', 'rb')
mapped_data = mmap.mmap(file.fileno(), 123456 * 4, access=mmap.ACCESS_READ)
array = np.ndarray.__new__(shape=(123456,), buffer=mapped_data, dtype='float32')
# use the array
The startup time of subsequent runs will be very fast; the array data will be paged in from disk when it's read.
Instead of mmap.ACCESS_READ, you could pass mmap.ACCESS_WRITE (in which case any changes to the in-memory array will propagate to disk), or mmap.ACCESS_COPY (in which case changes to the in-memory array will be allowed, but they will not be written to disk and will be lost when the process exits).
Here's the documentation for the mmap module.
2
Linux allows the memory mapping of a file. The file is mapped into virtual memory as needed, and written back as needed. The mapped file is the backing store for this virtual memory configuration. There's no use for a page file/swap area in such a situation. I have no idea if Windows has an equivalent capability.
– sawdust – 2019-09-11T02:46:37.473@sawdust It does; I know of two methods via Python: memmap, and Numpy memmap. @"no use for a pagefile" as in
– OverLordGoldDragon – 2019-09-11T23:12:02.873memmap!=pagefile use
? because, I may confirm: I recall cached files loading slightly, yet definitively, faster than now with memmap - on the plain old hybrid HDD1@OverLordGoldDragon, I think you may be misunderstanding what memory mapping (and the page file) do. Memory mapping essentially just tells the os "Hey, when I access memory address 0xabcdef, go read data from file x.dat instead." – kicken – 2019-09-13T18:34:16.007
@kicken I may be, yes - whatever it does, it cuts load time significantly (42-fold in my case) – OverLordGoldDragon – 2019-09-13T18:56:18.573