21

Please Consider: English is my second language.


On the Security Now! podcast episode 518 (HORNET: A Fix for TOR?), at the 27:51 mark Steve Gibson quotes an example of vulnerable code in C/C++:

"[...] one of them [problems with vulnerable code] is creating a new array of a certain size [...]. And the fix is 'of a certain size + 1'. So, [...] it [the vulnerable code] was just one byte too short. Probably a NULL terminator, so that when you fill the array with size objects, you would have one extra byte of NULL that would guarantee NULL termination, and that would prevent that string from being overrun. But that's not what the coder did: they'd forgotten the '+ 1' [...]"

I understand what he means: when you create an array, you need to allow one extra byte for the NULL termination byte. What I would like to achieve with this post is to get a pointer for further research into the impact of having an array whose last byte is not the byte terminator; I don't understand the full implications of such negligence, and how this could lead to an exploit. When he says that having the NULL termination

"would prevent that string from being overrun",

my question is "how is it overrun in cases where the NULL termination character is neglected?".

I understand that this is a huge topic and therefore do not to impose on the community too comprehensive of an answer. But if anyone could be kind enough to provide some pointers for further reading, I would be very appreciative and happy to go and do the research myself.

RoraΖ
  • 12,317
  • 4
  • 51
  • 83
  • Welcome. You can just google *null pointer exceptions in c++* –  Jul 29 '15 at 12:41
  • 13
    @begueradj This is about C, and about string termination. – domen Jul 29 '15 at 13:36
  • 4
    @begueradj Also, neither C nor C++ have null pointer exceptions for basic arrays, they have undefined behavior, which is what this question is trying to address in the first place. – IllusiveBrian Jul 29 '15 at 15:30
  • 2
    Cyan, part of the problem is that, as usual, Gibson did not quite understand the topic he was discussing, thus conflating terms often and making a general mess of things. You'd be best off completely ignoring what he said, since it's only value is in confusion. You should start fresh with a proper explanation. – AviD Jul 29 '15 at 16:33
  • You may find this article on the [vudo](http://phrack.org/issues/57/8.html) exploit to be interesting. It addresses a vulnerability quite similar to what you are asking about. – kasperd Jul 29 '15 at 21:57

4 Answers4

22

String Termination Vulnerability


Upon thinking about this more, using strncpy() is probably the most common way (that I can think of) that could create null termination errors. Since generally people think of the length of the buffer as not including \0. So you'll see something like the following:

strncpy(a, "0123456789abcdef", sizeof(a));

Assuming that a is initialized with char a[16] the a string will not be null terminated. So why is this an issue? Well in memory you now have something like:

30 31 32 33 34 35 36 37 38 39 61 62 63 64 65 66 
e0 f3 3f 5a 9f 1c ff 94 49 8a 9e f5 3a 5b 64 8e

Without a null terminator standard string functions won't know the length of the buffer. For example, strlen(a) will continue to count until it reaches a 0x00 byte. When is that, who knows? But whenever it finds it it will return a length much larger than your buffer; lets say 78. Lets look at an example:

int main(int argc, char **argv) {
    char a[16];

    strncpy(a, "0123456789abcdef", sizeof(a));

    ... lots of code passes, functions are called...
    ... we finally come back to array a ...

    do_something_with_a(a);
}

void do_something_with_a(char *a) {
    int a_len = 0;
    char new_array[16];

    // Don't know what the length of the 'a' string is, but it's a string so lets use strlen()!
    a_len = strlen(a);
    
    // Gonna munge the 'a' string, so lets copy it first into new_array
    strncpy(new_array, a, a_len);
}

You've now just written 78 bytes to a variable that only has 16 bytes allocated to it.

Buffer Overflows


A buffer overflow occurs when more data is written to a buffer than is allocated for that buffer. This is no different for a string except that many of the string.h functions rely on this null byte to signal the end of a string. As we saw above.

In the example we wrote 78 bytes to a buffer that is only allocated for 16. Not only that, but it's a local variable. Which means that the buffer has been allocated on the stack. Now those last 66 bytes that were written, they just overwrote 66 bytes of the stack.

If you write enough data past the end of that buffer you'll overwrite the other local variable a_len (also not good if you use it later), any stack frame pointer that was saved on the stack, and then the return address of the function. Now you have really gone and screwed things up. Because now the return address is something completely wrong. When the end of do_something_with_a() is reached, bad things happen.

Now we can add a further to the example above.

void do_something_with_a(char *a, char *new_a) {
    int a_len = 0;
    char new_array[16];

    // Don't know what the length of the 'a' string is, but it's a string so
    // lets use strlen()!
    a_len = strlen(a);
    
    // 
    // By the way, copying anything based on a length that's not what you
    // initialized the array with is horrible horrible coding.  But it's
    // just an example.
    //
    // Gonna munge the 'a' string, so lets copy it first into new_array
    strncpy(new_array, a, a_len);
    
    // 'a_len' was on the stack, that we just blew away by writing 66 extra 
    // bytes to the 'new_array' buffer.  So now the first 4 bytes after 16
    // has now been written into a_len.  This can still be interpreted as
    // a signed int.  So if you use the example memory, a_len is now 0xe0f33f5a
    //
    // ... did some more munging ...
    //
    // Now I want to return the new munged string in the *new_a variable
    strncpy(new_a, new_array, a_len);

    // Everything burns

}

I think my comments pretty much explain everything. But at the end you've now written a huge amount of data into an array most likely thinking that you're only writing 16 bytes. Depending on how this vulnerability manifests itself this could lead to exploitation via remote code execution.

This is a very contrived example of poor coding, but you can see how things can escalate quickly if you're not careful when working with memory, and copying data. Most of the time the vulnerability will not be this obvious. With large programs you have so much going on that the vulnerability might not be easy to spot, and could be triggered by code multiple function calls away.

For more on how buffer overflows work.

And before anyone mentions it, I ignored endianess when referencing the memory for the sake of simplicity


Further Reading

Full Description of the Vulnerability
Common Weakness Enumeration (CWE) entry
Secure Coding Strings Presentation (PDF automatically downloads)
University of Pittsburgh - Secure Coding C/C++: String Vulnerabilities (PDF)

RoraΖ
  • 12,317
  • 4
  • 51
  • 83
  • 2
    A buffer overflow can also occur when more data is *read from* a buffer than is allocated to the buffer. This is how Heartbleed worked, for example. – Mason Wheeler Jul 29 '15 at 14:05
  • Ehm, you had **22** characters in your original string literal, and now (after removing the extraneous 7) still have 21. Did you accidentally create the exact problem that you were trying to describe? "01234567890123456789" is a `char[21]` exactly because of the null terminator, and you try to store that `char[21]` in `char str[20]`. – MSalters Jul 29 '15 at 14:37
  • @MasonWheeler That's a buffer underflow. When you create a buffer of, say, 200 bytes and only initialize 16 and read it all out. You will end up reading a few bytes of garbage. – Ismael Miguel Jul 29 '15 at 15:21
  • 2
    @IsmaelMiguel Yeah, but as I understand it, that's not what happened. Heartbleed was creating a small buffer and reading a large amount of data out of it, past the end of the array. ([Obligatory XKCD](https://xkcd.com/1354/)) – Mason Wheeler Jul 29 '15 at 15:26
  • @MasonWheeler That's correct. But it's not really an overflow as much as an over-read. No buffer is actually overrun with data. – RoraΖ Jul 29 '15 at 15:30
  • @raz: Meh. It's still accessing beyond the bounds of the array. The access is `read` rather than `write`, but the fundamental problem is the same. – Mason Wheeler Jul 29 '15 at 15:34
  • This is true, feel free to talk in the [DMZ](https://chat.stackexchange.com/rooms/151/the-dmz) – RoraΖ Jul 29 '15 at 15:35
  • Raz, great answer! For a course I took at my university involving system security, we had to demonstrate a buffer overflow and return to a spot in memory that had shell execution assembly loaded in order to open a shell when the vulnerable function exited. It was pretty awesome. Strings are the primary reason why I hate C; they're way too complicated. – Chris Cirefice Jul 29 '15 at 15:45
  • @raz - thank you so much for the answer and the links. You say: "'a_len' was on the stack, that we just blew away by writing 66 extra bytes to the 'new_array' buffer". If I may ask, is it possible to tell by looking at the code whether the overrun will flow to a variable that, on the code, is above or below the orignal overfilled variable? For example: consider that in my code I have declared a `var1`, and this `var1` is above `var2`, which in turn is above `var3`. Suppose then that too many characters is put into `var2`. Where would the overrun spill to? `var1` or `var3`? –  Jul 29 '15 at 21:01
  • Just to point out that `char new_array[16]; a_len = strlen(a); strncpy(new_array, a, a_len);` is dumb code regardless of whether `a` is zero-terminated. – user253751 Jul 29 '15 at 22:21
  • @immibis Well it is a contrived example. – RoraΖ Jul 30 '15 at 11:09
  • @Cyan Great question! Local variables are generally allocated by using a single instruction to move the stack pointer by an entire *chunk* of memory. The [order of the local variables](https://stackoverflow.com/questions/27025216/how-does-gcc-push-local-variables-on-to-the-stack) is actually compiler implementation dependent, but is most likely in order of use. Since my example is contrived I chose that the `a_len` was above `new_array`, but it is possible that the two could be swapped. – RoraΖ Jul 30 '15 at 11:39
4

I'm at risk of being redundant by adding yet another answer, but I think the existing answers might not fully address what you're asking. In a traditional buffer overflow vulnerability (specifically of the stack-based variety), one tries to overwrite the frame pointer on the stack in order to cause execution to jump into exploit code when the current function tries to return.

Obviously that's not going to work if the only thing you (the attacker) can make the program write past the end of the buffer is a zero byte. Potentially you can cause the program to crash in this way by making it try to jump to an invalid address, but that's just a DoS and not remote code execution.

However, consider that you get the program to write a string of length 16 to a 16-byte buffer that we'll call "A", so that the null byte overruns. Then you cause the program to overwrite that null byte with something that is not \0, so now string A is not null-terminated. If you then get the program to send you the contents of A, it will read on past the end of A, potentially giving you access to all kinds of secret information. Heartbleed used this kind of info disclosure to steal private keys, which is pretty serious.

At this point string A is actually longer than the programmer expected. It's not too hard to imagine the programmer relying on A being a 16-byte string and copying it elsewhere, potentially overflowing other buffers by much more than one byte. This could then be used to execute arbitrary code.

Lexelby
  • 51
  • 2
1

As you indicated in your response, buffer overflows are a likely vulnerability if a programmer does not terminate a character string with a NULL byte. The reason is that most string functions assume this, and will continue until they encounter a zero. If you are lucky, the error is bad enough that you get a segmentation fault error early in development, so that you can debug and correct the problem. However, with many bugs a failure this obvious will only occur under special conditions. Often an attacker can take advantage of the way the program behaves, and depending the particulars of the vulnerability, exploit it in order to read contents of memory that should be hidden, or copy data from user input into areas of memory the user was not meant to control, etc. If the buffer is located on the stack, the latter exploit can be used to inject code, and to overwrite the return address stored in the stack frame located at a higher address (on x86) on the stack. Some operating systems have protections, such as non-executable segments of memory, but this is definitely something you as the programmer don't want to rely on :)

Programmers who are new to programming in languages like C may find the practices to avoid these problems difficult or error prone, but eventually it becomes second nature, although it is still possible to make a mistake. Just practice as much as possible, I've been programming in C for about 7 years, and I still need correct a mistake from time to time.

A good way to practice is allocate a character array, memset the whole array with some non printable ASCII character other than 0, 1's is just fine. Use your standard library function of choice to copy some string into the array, obviously if it crashes the program is incorrect. Otherwise, just use a basic for loop to iterate over every element in the array and print out the numeric value, check to make sure the 0 is where it should be, if you use printf, it will stop at that point in the string. I find this is a good way to experiment to find the differences between functions, e.g. strcat, strncat, strcpy, strlcpy, strlcpy, strlcat, sprint, etc. I would recommend using strlcpy and strlcat over strncpy and strncat for most things, they much easier and less error prone.

Another tip, if it seems difficult to validate your algorithm in your head, imagine that you are doing the same thing on an extremely small input. For strings, imagine that you are performing the operation on a string with only room for 1 character plus the NULL byte. This makes it easy to see many properties of character strings that otherwise require more brain work. For instance, you need to allocate an array with 2 elements, even though you need to store a single char. The second element, str[1], obviously needs to be 0. strlen will report that the length of the string is 1. Now you can comfortably generalize to know that strlen(str) is always the index of the NULL byte (assuming it is NULL terminated of course :), likewise strlen(str) - 1 where > 0 is always the index of the last character in the string. The amount of storage needed for the string and the NULL byte is always strlen(str) + 1.

One last thing. It is important to note that NULL terminated strings are just a convention. There have been and are many possible alternatives. It is only required if you use functions that assume the NULL byte indicates the point in memory where it should stop doing stuff. This is the case with the string functions in libc. You could write your own string functions that store the length prepended to the beginning of the string. At the cost of some extra complexity introduced by the type punning required for strings with lengths exceeding 255 characters, and maintaining this number whenever the string is updated, this approach has the advantage of finding the length of the string in O(1) time instead of O(n). You could also store a the string pointer and length value in a structure, although this can't be neatly generalized to strings not located on the heap. Most programmers will probably just tell you that you should stick with standard string representations for most things, and they are probably right. But if it's your own code, who should tell you what to do, it is your universal computation machine (finite approximation at least), explore the landscape of computation and make it your sandbox, and have fun!

  • thank you so much. I like it how you suggest an approach to practice and visualize it. +1 for sure. –  Jul 29 '15 at 20:45
  • 1
    You're welcome :) And I want to augment my response with a detail that I left out. The approach I explained to test string functions with memset and printing needs to be adapted for strcat and similar functions, set the first element to 0, and leave the rest 1's. It works fine for testing strcpy as is, and a little experimentation reveals why strlcpy may be better. – user3259161 Jul 29 '15 at 21:09
1

This is floating around in the above answers, but I believe it should be made explicit. C/C++'s character array handling has a number of potential off-by-one hazards... Examples:

""  // a zero length string requiring one byte of storage
    // in memory:  00

"Hi."  // a length 3 string requiring four bytes of storage
       // in memory:  48 69 2e 00
"Hi."[3]  // is the 00, the characters in a string and a string array are indexed starting at 0, to wit
"Hi."[0]  // is the 'H'.

char foo[3]  // a length three character array requiring three bytes of storage
     bar[4]  // a length four character array requiring four bytes of storage

strncpy(foo, "Hi.", 3)  // copies three characters from a length three string to a length three character array.  
                        // The result is not a string because the null is not copied.

strcpy(foo, "Hi.")  // copies four characters from a length three string to a length three character array
                    // This causes overrun of the array.
                    // It writes 00 on whatever (if anything) is allocated next in storage.

strcpy(bar, "Hi.")  // copies four characters from a length three string to a length four character array.
                    // This works/is safe (enough).

So

  • a length three string contains four characters.
  • a length three string does not fit in a length three character array
  • copying three characters from a length three string does not copy the string
  • if mystring is a length n string, mystring[n] is the terminating 00. Consequently, one may reason briefly (or not at all) that copying up to the n-th character will copy the 00.

Or, to summarize, this is maximally designed to cause off-by-one errors.

Eric Towers
  • 111
  • 4