How to store data in pagefile?

1

1

Memory-mapping, e.g. via Python's numpy.memmap, works, albeit temporarily; once pagefile capacity is exceeded, the arrays are silently unmapped from pagefile. Re-mapping each time is undesired - need persistence. Further, I don't know how to view the pagefile - i.e. see what's on it.

Intended use: using SSD pagefile as 'pseudo-RAM', w/ 10% of RAM's read speed to accelerate deep learning by loading an entire dataset into memory (but reading only 500MB at a time).

How can this be accomplished? Help is appreciated.


SPECS:
  • System: Win-10 OS, ASUS ROG Strix GL702VSK
  • SSD: 512GB, 3.5GB/s read speed -- NVMe PCIe 970 PRO
  • Pagefile: 80GB, on C-drive (SSD drive, system drive)
  • RAM: 24GB DDR4 2.4-MHz
  • CPU: i7-7700HQ 2.8 GHz

OverLordGoldDragon

Posted 2019-09-11T02:01:24.033

Reputation: 188

2

Linux allows the memory mapping of a file. The file is mapped into virtual memory as needed, and written back as needed. The mapped file is the backing store for this virtual memory configuration. There's no use for a page file/swap area in such a situation. I have no idea if Windows has an equivalent capability.

– sawdust – 2019-09-11T02:46:37.473

@sawdust It does; I know of two methods via Python: memmap, and Numpy memmap. @"no use for a pagefile" as in memmap!=pagefile use? because, I may confirm: I recall cached files loading slightly, yet definitively, faster than now with memmap - on the plain old hybrid HDD

– OverLordGoldDragon – 2019-09-11T23:12:02.873

1@OverLordGoldDragon, I think you may be misunderstanding what memory mapping (and the page file) do. Memory mapping essentially just tells the os "Hey, when I access memory address 0xabcdef, go read data from file x.dat instead." – kicken – 2019-09-13T18:34:16.007

@kicken I may be, yes - whatever it does, it cuts load time significantly (42-fold in my case) – OverLordGoldDragon – 2019-09-13T18:56:18.573

Answers

0

Storing data in the page/swap file (pagefile.sys on Windows) means storing it in virtual memory. If that's really what you want, then you're already doing it whenever you allocate an array in the usual way.

Virtual RAM, like physical RAM, doesn't survive a reboot. There's no way to store data permanently in the page file. It could technically be done because it is a file on a persistent medium, but it just isn't meant for that. Its purpose is to simulate physical RAM.

It sounds like what you really want is to store your numpy array not in the page file, but in an ordinary disk file – the opposite of your title.

I've never done this, but according to the documentation you linked,

An alternative to using this subclass is to create the mmap object yourself, then create an ndarray with ndarray.__new__ directly, passing the object created in its ‘buffer=’ parameter.

which means that you ought be to able to create the array data like this:

file = open('backing_file', 'xb')
mapped_data = mmap.mmap(file.fileno(), 123456 * 4, access=mmap.ACCESS_WRITE)
array = np.ndarray.__new__(shape=(123456,), buffer=mapped_data, dtype='float32')
# fill in the array

and then, on a subsequent run, map the array into memory like this:

file = open('backing_file', 'rb')
mapped_data = mmap.mmap(file.fileno(), 123456 * 4, access=mmap.ACCESS_READ)
array = np.ndarray.__new__(shape=(123456,), buffer=mapped_data, dtype='float32')
# use the array

The startup time of subsequent runs will be very fast; the array data will be paged in from disk when it's read.

Instead of mmap.ACCESS_READ, you could pass mmap.ACCESS_WRITE (in which case any changes to the in-memory array will propagate to disk), or mmap.ACCESS_COPY (in which case changes to the in-memory array will be allowed, but they will not be written to disk and will be lost when the process exits).

Here's the documentation for the mmap module.

benrg

Posted 2019-09-11T02:01:24.033

Reputation: 548

Thanks for the response; to clarify, is it the hard drive where data is paged mediating the data reads? That is, on my hybrid drive, I noted paged arrays load far faster than the drive's specified read speed. On my current NVMe SSD, I'm able to utilize the full 3.5GB/s read speed via np.memmap, but wonder if it too can be boosted further as with the hybrid drive. To note, my memmaps are persistent and survive reboots - so as comments under the question also noted, the pagefile appears not involved. – OverLordGoldDragon – 2019-09-25T18:52:28.023