How does Windows track in-use Memory Locations?

1

1

My understanding of the (basic) definiton of a Memory Leak is a location in memory that has been reserved but the pointer to that memory within the application that reserved the memory has been deleted/lost.

My question is this: in order for Windows (or indeed any operating system) to be aware that memory is still reserved after the destruction of the pointer used to manage that memory inside an application, it must independently track what memory is in use. Where does it do this, and how?

Ieuan Stanley

Posted 2016-08-15T08:25:46.030

Reputation: 121

Answers

2

You are correct - every operating system tracks memory use and allocation separately. This is one of it's vital functions. For many reasons it simply wouldn't make sense to pass complete control over memory to a userland process - security can be pointed out as one example: Your process could store sensitive information in memory, thus Operating system should manage which processes will be allowed to access it.

Generally all operating systems have a sub-system named Memory Management (or Memory Manager in Windows), which keeps track of virtual memory. Another part of the OS, Hardware Abstraction, then maps this virtual memory to physical devices. This, of course, prohibits direct interaction between any user process and physical memory.

Directly interacting with this part of operating system is partially available through public API - see Memory Management Functions for Windows. In Linux there are several ways to access memory, this page would be best way to get started with it.

Marek Rost

Posted 2016-08-15T08:25:46.030

Reputation: 1 826

1

Well, yes, it's a function of a part of the OS called the memory manager, but really that just kicks the question down the road a bit. OP asked not only "where" but "how".... so, here's how.

The Windows memory manager (abbrev. Mm) keeps track of in-use spans of virtual addresses via a set of data structures called Virtual Address Descriptors. For user mode address space this is done on a per-process basis. When a process is first created it has no VADs. When a chunk of virtual address space goes from free to not-free - not-free could be reserved, committed, mapped, or one or two less common options - a VAD is created. The VAD contains the start address (always page-aligned) and size (also always page-aligned) of the region, the allocation type (reserved, committed, etc.), and a few other things.

The region described by a VAD can be redefined and/or subdivided. For example it is very common for a large reserved range to have a piece of itself changed to "committed". This results in the original VAD being modified to reflect the new length of the reserved region and a new VAD created to describe the committed region. If the committed region comes out of the "middle" of the range defined by the original VAD, two new VADs have to be created.

The VADs for each process are organized into a tree structure of a type called a balanced binary tree, or more specifically, a splay tree, ordered by the starting virtual address of each region. When attempting to determine whether a given address is in use, and if so, how it is in use, the memory manager "walks" the tree to find the VAD that includes the address - or not, if there isn't one.

Because of the nature of this type of tree, this operation is very efficient, both for finding existing VADs and for adding and removing VADs. for finding existing VADs it is efficient in the same way that a "binary search" is efficient on a simple ordered list. If there are, for example, from 31 to 64 VADs in the tree, a maximum of six VADs have to be looked at to look up any given address. For 65 to 128 VADs, the search takes just seven steps, etc. But unlike a simple ordered list, adding or removing entries doesn't require reshuffling all of the existing ones.

When a new region must be allocated the tree must be "walked" to find a free region of sufficient size. This is a moderately costly operation but that's ok, as it doesn't happen that often.

Regarding "destruction of the pointer" - a pointer is just a location in memory. When I allocate virtual memory (in the Windows API the low-level call for this is VirtualAlloc) I get a pointer to the allocated region. Overwriting the pointer with zero, or even deallocating the virtual memory it's stored in, does not tell the OS that I'm done with the region. After all, I might have copied that pointer value someplace else. What does tell the OS "I'm done with that region" is a call to VirtualFree. That removes the VAD that described the region. Of course, all per-process virtual address space is an attribute of the process, and anything still allocated disappears when the process is deleted (because all of the VADs go away too).

The association between virtual page numbers and physical page numbers is maintained in structures called page tables. These also assist in managing virtual address space. There is a page table, consisting of 512 page table entries and occupying one page, for each 2 MB of virtual address space; each PTE describes one page. The PTs are organized into a simple hierarchical structure, three levels deep on 32-bit x86, four levels on x64. Page tables corresponding to 2 MB spans of v.a.s. that are all free (i.e. no VADs describe anything in the 2 MB) do not actually exist; thus the corresponding entries in the upper-level PTs are zero. The page tables are also maintained and updated by the OS's memory manager (not the "hardware abstraction layer" as claimed in another answer).

(The preceding paragraph assumes PAE on x86. Without PAE it is 1024 PTEs per PT, and there are only two levels of PTs in the tree.)

Yet another data set, the ''PFN array'', keeps track of the state of every physical page. It is a simple array of structures, indexed by physical page number ("page frame number"). Each element of the array contains information about the state of the corresponding physical page: Which of the system-wide page lists it's on (free, standby, zeroed, modified) or if it's in one or more process working sets, how many processes it's in, etc.

Jamie Hanrahan

Posted 2016-08-15T08:25:46.030

Reputation: 19 777