79

Is it good secure programming practice to overwrite sensitive data stored in a variable before it is deleted (or goes out of scope)? My thought is that it would prevent a hacker from being able to read any latent data in RAM due to data-remanence. Would there be any added security in overwriting it several times? Here is a small example of what I am talking about, in C++ (with comments included).

void doSecret()
{
  // The secret you want to protect (probably best not to have it hardcoded like this)
  int mySecret = 12345;

  // Do whatever you do with the number
  ...

  // **Clear out the memory of mySecret by writing over it**
  mySecret = 111111;
  mySecret = 0;
  // Maybe repeat a few times in a loop
}

One thought is, if this does actually add security, it would be nice if the compiler automatically added the instructions to do this (perhaps by default, or perhaps by telling the compiler to do it when deleting variables).


This question was featured as an Information Security Question of the Week.
Read the Dec 12 2014 blog entry for more details or submit your own Question of the Week.

Jonathan
  • 3,157
  • 4
  • 26
  • 42
  • See also: [the operating system-level view](http://security.stackexchange.com/questions/74288/is-data-remanence-a-concern-in-ram) – Gilles 'SO- stop being evil' Dec 04 '14 at 18:05
  • 15
    Note that this can be [tricky](http://stackoverflow.com/questions/3785366/how-to-ensure-that-compiler-optimizations-dont-introduce-a-security-risk) depending on the [language](http://stackoverflow.com/questions/3785582/how-to-write-a-password-safe-class). (See the two linked questions and especially their answers) – ereOn Dec 04 '14 at 18:36
  • 23
    Also note that strings in java are immutable, so overwriting (assigning new value to reference variable) will have no effect. – rkosegi Dec 04 '14 at 19:44
  • 12
    Not only is this a good idea, but other steps you might want to take are to `mlock` the memory (to ensure it's not written to swap), `mprotect` the page so that it's read-only once the secret data has been initialized (also possibly to mark the page as non-accessible *at all* except in the small windows in which you intend to access it), to have a "canary" value written to memory immediately after the secret to detect during dealloc if it's been overwritten by an overflow, and to allocate extra non-accessible guard pages before and after the secret to SEGV on overflow and underflow. – Stephen Touset Dec 04 '14 at 21:42
  • 1
    See libsodium's [`sodium_malloc`](https://github.com/jedisct1/libsodium/blob/master/src/libsodium/sodium/utils.c#L392) as an implementation of that. – Stephen Touset Dec 04 '14 at 21:43
  • 9
    @rkosegi that's why one should use byte[], not string, for keys. – Alexander Dec 05 '14 at 08:53
  • 17
    The C/C++ compiler is actually free to optimize your overwritings away unless you have declared your variable as `volatile`, because they don't change the result of executing the code (the value is not used anywhere). – Ruslan Dec 06 '14 at 12:25
  • 4
    Zeroization is required by NIST's [FIPS 140-2](http://csrc.nist.gov/groups/STM/cmvp/standards.html), even at Level 1 validations. Attackers do use the memory, and they don't need to be local. For example, we know [the NSA will log Windows Error Reporting into its XKeyscore system to help gain unauthorized access](http://en.wikipedia.org/wiki/Tailored_Access_Operations). –  Dec 08 '14 at 06:22
  • 1
    Whilst most answers mention the benefit of overwriting the data (if possible), I can't see the benefit of overwriting "a few times in a loop" as suggested in the question, but not touched on by the current batch of answers AFAICS. Unless this data is being written to a magnetic medium and the attacker has low level access to the system then I would have thought that overwriting once should be sufficient? Or is multiple writes still beneficial? – MrWhite Dec 09 '14 at 19:25

10 Answers10

74

Yes that is a good idea to overwrite then delete/release the value. Do not assume that all you have to do is "overwrite the data" or let it fall out of scope for the GC to handle, because each language interacts with the hardware differently.

When securing a variable you might need to think about:

  • encryption (in case of memory dumps or page caching)
  • pinning in memory
  • ability to mark as read-only (to prevent any further modifications)
  • safe construction by NOT allowing a constant string to be passed in
  • optimizing compilers (see note in linked article re: ZeroMemory macro)

The actual implementation of "erasing" depends on the language and platform. Research the language you're using and see if it's possible to code securely.

Why is this a good idea? Crashdumps, and anything that contains the heap could contain your sensitive data. Consider using the following when securing your in-memory data

Please refer to StackOverflow for per-language implementation guides.

You should be aware that even when using vendor guidance (MSFT in this case) it is still possible to dump the contents of SecureString, and may have specific usage guidelines for high security scenarios.

Pang
  • 185
  • 6
makerofthings7
  • 50,090
  • 54
  • 250
  • 536
  • The question was about overwriting first and then deleting. Can you add to your answer? – Rory Alsop Dec 04 '14 at 16:33
  • 2
    Though it is also possible to view the contents of a program's memory space during its operation. Deleting of memory just removes the pointer to the memory (much like the deletion of a file removes the file pointer to the file from the filesystem) Overwriting it helps keep it "secure." – Desthro Dec 04 '14 at 16:35
  • 9
    Assuming that overwriting it actually overwrites it, and doesn't just move the pointer to some new section of memory with the new values (which means the old data sticks around until the memory is reused, just like with deleting it). – Lawtonfogle Dec 04 '14 at 16:52
  • @lawtonfogle Pointers are typically memory addresses, so any data you write via the pointer goes into that address space. Though that is an interesting idea. Your method described below isn't quite the same thing as using a pointer. – Desthro Dec 04 '14 at 17:01
  • 9
    @Desthro: Unless you're working in real mode (i.e. you're writing an OS kernel), pointers are typically only virtual memory address; only the kernel has access to real memory addresses. It is certainly possible for the operating system to move the two memory pages to a different real address when the memory is overwritten, this can be done without changing the virtual address. This happens during swapping for example. – Lie Ryan Dec 04 '14 at 17:37
  • @LieRyan shows how old my programming knowledge is ;) – Desthro Dec 04 '14 at 19:35
  • 3
    I think you'd definitely want the volatile keyword to make sure that what happens is actually an overwrite – raptortech97 Dec 05 '14 at 01:48
  • 4
    @raptortech97 does `volatile` actually ensure that though? Suppose a process is suspended and its memory paged out before the sensitive variable can be overwritten, then it's later resumed. There's no guarantee the virtual memory will be paged back into the same section of physical memory, is there? In such a case `volatile` doesn't guarantee an overwrite. I think. I'm honestly not sure though. (`volatile` prevents certain compiler optimizations, I get that, but it seemed like you were saying it does more than that) – David Z Dec 05 '14 at 08:57
  • 2
    I feel like what we've established in these comments is that computers are hard to "outwit" – deworde Dec 05 '14 at 09:39
35

Storing a value that isn't used again? Seems like something that would be optimized out, regardless of any benefit it might provide.

Also, you may not actually overwrite the data in memory depending upon how the language itself works. For example, in a language using a garbage collector, it wouldn't be removed immediately (and this is assuming you didn't leave any other references hanging around).

For example, in C#, I think the following doesn't work.

string secret = "my secret data";

...lots of work...

string secret = "blahblahblah";

"my secret data" hangs around until garbage collected because it is immutable. That last line is actually creating a new string and having secret point to it. It does not speed up how fast the actual secret data is removed.

Is there a benefit? Assuming we write it in assembly or some low lever language so we can ensure we are overwriting the data, and we put our computer to sleep or left it on with the application running, and we have our RAM scraped by an evil maid, and the evil maid got our RAM data after the secret was overwritten but before it would have just been deleted (likely a very small space), and nothing else in RAM or on the harddrive would give away this secret... then I see a possible increase in security.

But the cost versus the benefit seems to make this security optimization very low on our list of optimizations (and below the point of 'worthwhile' on most applications in general).

I could possibly see limited use of this in special chips meant to hold secrets for a short time to ensure they hold it for the shortest time possible, but even then I'm uncertain about any benefit for the costs.

psmears
  • 900
  • 7
  • 9
Lawtonfogle
  • 981
  • 7
  • 11
  • 2
    "likely a very small space" -- it's probably not that hard to come up with cases where the very small space is at least the remaining lifetime of the process. That is to say, there could be unused memory that for various reasons (and I don't just mean a memory leak) is liable to remain unused for quite a while. The evil maid will, after all, scan all our memory for anything potentially useful. Overwriting it at the point it became unused would therefore close a window. But like you say we still can't count on any security benefit at all from `mySecret = 0; mySecret = -1;`. – Steve Jessop Dec 04 '14 at 18:36
  • 2
    If you are concerned about the assignment being optimized out, just declare the variable as `volatile` (in C/C++). That will do the trick. If the value is a string, overwrite the *contents* of the string instead of using a simple assignment. – Alex D Dec 05 '14 at 07:21
  • 2
    Numerous language dependent tricks that do not carry across. Any libraries that you use that stores secret data will need to be checked line by line to ensure they do this. And if they don't, do you reimplement the math/crypto libraries? The security increase per resource spent ratio is so low and the chance of worsening your security (via library rewrites) is high enough that I don't see a justified use case for the average developer. – Lawtonfogle Dec 05 '14 at 13:54
19

You need a threat model

You should not even begin to think about overwriting security variables until you have a threat model describing what sorts of hacks you are trying to prevent. Security always comes at a cost. In this case, the cost is the development cost of teaching developers to maintain all of this extra code to secure the data. This cost means it may be more likely your developers will make mistakes, and those mistakes are more likely to be the source of a leak than a memory issue.

  • Can the attacker access your memory? If so, is there a reason you think they couldn't/wouldn't just sniff the value before you overwrite it? What sort of timeframes does the attacker have access to your memory
  • Can the attacker access core dumps? Do you mind if they can access sensitive data in exchange for being noisy enough to cause a core dump in the first place?
  • Is this open source, or closed source? If it's open source, you have to worry about multiple compilers, because compilers will optimize away stuff like overwritten data all the time. Their job is not to provide security. (For a real life example, Schneier's PasswordSafe has specialized classes to protect the unencrypted password data. In order to do so, he uses Windows API functions to lock the memory, and force it to be overwritten properly rather than using the compiler to do it for him)
  • Is this a garbage collected language? Do you know how to force YOUR particular version of your particular garbage collector to actually get rid of your data?
  • How many tries can the attacker make at getting sensitive data before you notice and cut him off with other means (such as firewalls)?
  • Is this being run in a virtual machine? How sure are you of the security of your Hypervisor?
  • Does an attacker have physical access? For example, windows is more than happy to use a flash drive to cache virtual memory. All an attacker would need to do is convince windows to push it onto the flash drive. When that happens, it is REALLY hard to get it off. So hard, in fact, that there is no recognized method of reliably clearing data from a flash drive.

These questions need to be addressed before considering trying to overwrite sensitive data. Trying to overwrite data without addressing the thread model is a false sense of security.

Cort Ammon
  • 9,206
  • 3
  • 25
  • 26
  • 1
    Another cost is more code and more instructions for the computer to execute (slower), but yes, good point to consider the threat. I would think a hacker accessing the secret data from RAM while the program is running, or even afterward would be one of the greatest threats. I think it would be nice if you could tell the compiler (or whatever builds / runs the language you are using) to zero out the data when it is done. – Jonathan Dec 06 '14 at 17:29
  • 2
    *"Can the attacker access your memory"* - well, we know the attacker will use the memory if available. And it does not need to be on the local machine. For example, we know [the NSA will log Windows Error Reporting into its XKeyscore system to help gain unauthorized access](http://en.wikipedia.org/wiki/Tailored_Access_Operations). That happens whether your threat model includes it or not :) –  Dec 08 '14 at 06:17
  • Whether it happens is very different from whether you actually want to try to do anything about it. To quote Kevin Mitnick, "The only truly secure computer is one that is off the internet, unplugged, stored in a concrete bunker under ground with armed guards over it, and even then I'd check on it every once in a while." If you really want to make your program NSA secure, you're going to need a whole lot more than StackExchange to help ;-) – Cort Ammon Dec 08 '14 at 15:43
14

Yes, it is good practice security-wise to overwrite data that is particularly sensitive when the data is no longer necessary, i.e. as part of an object destructor (either an explicit destructor provided by the language or an action that the program takes before deallocating the object). It is even good practice to overwrite data that isn't in itself sensitive, for example to zero out pointer fields in a data structure that goes out of use, and also zero out pointers when the object they point to is freed even if you know you aren't going to use that field anymore.

One reason to do this is in case the data leaks through external factors such as an exposed core dump, a stolen hibernation image, a compromised server allowing a memory dump of running processes, etc. Physical attacks where an attacker extracts the RAM sticks and makes use of data remanence are rarely a concern except on laptop computers and perhaps mobile devices such as phones (where the bar is higher because the RAM is soldered), and even then mostly in targeted scenarios only. Remanence of overwritten values is not a concern: it would take very expensive hardware to probe inside a RAM chip to detect any lingering microscopic voltage difference that might be influenced by an overwritten value. If you're worried about physical attacks on the RAM, a bigger concern would be to ensure that the data is ovewritten in RAM and not just in the CPU cache. But, again, that's usually a very minor concern.

The most important reason to overwrite stale data is as a defense against program bugs that cause uninitialized memory to be used, such as the infamous Heartbleed. This goes beyond sensitive data because the risk is not limited to a leak of the data: if there is a software bug that causes a pointer field to be dereferenced without having been initialized, the bug is both less prone to exploitation and easier to trace if the field contains all-bits-zero than if it potentially points to a valid but meaningless memory location.

Beware that good compilers will optimize the zeroing out if they detect that the value is no longer used. You may need to use some compiler-specific trick to make the compiler believe that the value remains in use and thus generate the zeroing out code.

In many languages with automatic management, objects can be moved in memory without notice. This means that it's hard to control leaks of stale data, unless the memory manager itself erases unused memory (they often don't, for performance). On the plus side, external leaks are usually all you have to worry about, since high-level languages tend to preclude the use of uninitialized memory (beware of string creation functions in languages with mutable strings).

By the way, I wrote “zero out” above. You can use a bit pattern other than all zeros; all-zeros has the advantage that it's an invalid pointer in most environments. A bit pattern that you know is an invalid pointers but that is more distinctive can be helpful in debugging.

A lot of security standards mandate the erasure of sensitive data such as keys. For example, the FIPS 140-2 standard for cryptographic modules requires it even at the lowest assurance level, which apart from that only requires functional compliance and not resistance against attacks.

Gilles 'SO- stop being evil'
  • 50,912
  • 13
  • 120
  • 179
  • 1
    Normally secrets aren't directly acted upon. A cryptographic library is used (as implementing an algorithm, even a good one, is difficult and am implementation bug can render your security useless). So does one limit themselves to libraries that only change data before deleting it (possibly eliminating the best library for the job)? Or does one use a library that allows data to hang around until deleted, thus losing the benefit of applying this to ones own code? – Lawtonfogle Dec 05 '14 at 14:00
  • 1
    @Lawtonfogle Cryptographic libraries are often written in such a way to erase at least keys after use. It's considered good hygiene among crypto implementers and I expect (though I have no firm data on the topic) that it's correlated with reasonable quality measures. Even if the library doesn't do it, you can often control the library's memory management from the application. – Gilles 'SO- stop being evil' Dec 05 '14 at 14:03
  • I have to admit, it does seem like the security experts writing crypto libraries would also incorporate other security practices even if they aren't directly related to the crypto itself. But you'd want to make sure, hopefully it's open source (if for no other reason than to verify its cryptographic integrity). – corsiKa Dec 07 '14 at 22:41
  • 1
    "If you're worried about physical attacks on the RAM, a bigger concern would be to ensure that the data is ovewritten in RAM and not just in the CPU cache." -- You can force memory barrier (at lowest level implemented as a CPU instruction) to enforce flushing CPU cache to RAM. If the programming language is higher level, one way to do it is to employ some memory sharing primitive like for example mutex (lock, erase, unlock) which use memory barriers as part of the implementation. – FooF Dec 19 '14 at 08:26
8

For a (hopefully interesting) addition to the rest of the answers, many people underestimate the difficulty in properly overwriting memory with C. I am going to be quoting heavily from Colin Percival's blog post titled "How to zero a buffer".

The main problem facing naive attempts at overwriting memory in C is compiler optimizations. Most modern compilers are "smart" enough to recognize that the common idoms used for overwriting memory does not in fact change the observable behaviour of the program and can be optimized away. Unfortunately this completely breaks what we want to achieve. Some of the common tricks are described in the blog post linked above. Worse still, a trick that works for one version of a compiler may not necessarily work with another compiler or even a different version of the same compiler. Unless you are distributing binaries only, this is a problematic situation.

The only way to reliably overwrite memory in C that I know of is the memset_s function. Sadly, this is only availble in C11 so programs written for older versions of C are out of luck.

The memset_s function copies the value of c (converted to an unsigned char) into each of the first n characters of the object pointed to by s. Unlike memset, any call to memset_s shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. That is, any call to memset_s shall assume that the memory indicated by s and n may be accessible in the future and therefore must contain the values indicated by c.

Unfourtuntely, Colin Percival is of the opinion that overwriting memory isn't enough. From a followup blog post titled "Zeroing buffers is insufficient", he states

With a bit of care and a cooperative compiler, we can zero a buffer — but that's not what we need. What we need to do is zero every location where sensitive data might be stored. Remember, the whole reason we had sensitive information in memory in the first place was so that we could use it; and that usage almost certainly resulted in sensitive data being copied onto the stack and into registers.

He goes on and gives the example of AES implementations using the AESNI instruction set on x86 platforms leaking data in the registers.

He claims that,

It is impossible to safely implement any cryptosystem providing forward secrecy in C.

A troubling claim indeed.

  • Thank you for your insights! It would definitely be nice if the C and C++ compilers included an option to zero out memory when it goes out of scope (or perhaps a way to tell the compiler which variables to securely delete). – Jonathan Dec 16 '14 at 15:06
4

It is important to overwrite sensitive data immediately after it is needed because, otherwise:

  • The data stays on the stack until overwritten, and might be viewable with a frame overflow from a different procedure.
  • The data is subject to memory scraping.

Indeed, if you look at the source code for security-sensitive applications (e.g. openssh), you will find that it carefully zeroes out sensitive data after use.

It is also true that compilers might try to optimize out the overwriting, and, even if not, it is important to know how the data is stored physically (e.g. if the secret is stored on SSD, overwriting it might not erase the old content due to wear leveling).

Ari Trachtenberg
  • 822
  • 6
  • 14
3

The original example shows a stack variable because it's a native int type.

Overwriting it is a good idea otherwise it lingers on the stack until overwritten by something else.

I suspect if you are in C++ with heap objects or C native types allocated via pointers and malloc that it would be a good idea to

  1. Use volatile
  2. Use pragmas to surround the code using that variable and disable optimisations.
  3. If possible, only assemble the secret in intermediate values rather than any named variables, so it only exists during calculations.

Under the JVM or C# I think all bets are off.

Andy Dent
  • 169
  • 6
  • *"If possible, only assemble the secret in intermediate values rather than any named variables"* This doesn't make much sense as most release-compiled binaries don't include naming information. If I disassemble a C++ compiled application, I will be able to tell what instructions it executes, but I almost certainly won't be able to tell any variable names. – user Dec 08 '14 at 12:26
  • 1
    I may be wrong, but I was thinking about where intermediate values are stored as opposed any single value assigned to a variable. If you put a constant in your code and assign it to a variable, at some point that constant is linked to a value in a static block. Intermediate values either exist in registers or internally in the chip cache. If you were calculating some kind of key as an intermediate value and then comparing that result against a value from a service, the whole thing is much more transitory. – Andy Dent Dec 08 '14 at 13:55
2

It is possible that the chunk of memory was paged out before you went and deleted it, depending on how the usage distribution in the page file plays out the data may live there forever.

So to successfully remove the secret from the computer you must first ensure the data never reaches persistent memory by pinning the pages. Then ensure the compiler doesn't optimize out the writes to the to-be-deleted memory.

ratchet freak
  • 325
  • 1
  • 8
1

Actually the answer is yes, you probably should overwrite it before deleting it. It you are a web-application developer then you should probably overwrite all data before using delete or free functions.

Example of how this bug can be exploited:

The user can insert some malicious data into input field. You proceed the data and then call free function of the allocated memory without overwriting it, so the data will remain in the memory. Then the user makes your web-application to fall (for ex. by uploading a very big picture or something like this) and the UNIX system will safe the core memory dump into core or core.<pid> file. Then if the user can brute the <pid>, which will not take too long, and the core dump file will be interpreted as a web-shell, because it contains user's malicious data.

PaulOverflow
  • 273
  • 1
  • 9
-2

Isn't int secret=13123 a constant in local scope?

You shouldn't be too taken up with what you're writing is anywhere close to being worthwhile to being read. In fact, I'd suggest, in a bit of contrarian advice, that you deliberately leave it readable. And further, populate it randomly with strings that are correlated in time, that are not called in the standard way.

That way, if you check out dark web sites to find out if you've been compromised, you can tell precisely how it was done. Since a db dump is separate from a scraper dump.

mincewind
  • 41
  • 4