Tricky code to make memory-safe

Question

I'm designing a homework challenge for students who are learning about memory safety and writing secure C code. As part of this, I am looking for a small programming task where it's non-trivial to write C code that is free of buffer overruns, array out-of-bounds errors, and/or other memory safety errors. What would be a good example of such a task?

In other words: I specify the desired functionality; they implement it in C; and if they're not careful when implementing, there's a significant chance their code will have a memory-safety vulnerability. Ideally, I'd prefer something that can be implemented concisely (a few hundred lines of code, at most) to keep the task of manageable size, and it'd be extra-cool if the task were somehow industrially or practically relevant or realistic or representative of real-world programming.

To give an example from a different domain, implementing binary search on a sorted list is a classic example of an easily specified programming task where if you're not careful when implementing it, there's a significant chance you will have some sort of logic bug (e.g., an off-by-one error, an infinite loop on some inputs, that sort of thing). Is there any good corresponding task, for security, and specifically memory-safety vulnerabilities?

If you're cruel, use an ASN.1 parser. – CodesInChaos Dec 10 '13 at 14:44 — CodesInChaos, Dec 10 '13 at 14:44

score 12 · Answer 1 · answered Dec 10 '13 at 13:18

A typcial example of non-trivial buffer handling is the parsing of binary files (or network packets) that can contain arbitrary-length strings. (Is there any ASN.1 parser that didn't have buffer overflows bugs at some time?)

For example, consider the format of textual data chunks in PNG files:

The keyword and text string are separated by a zero byte (null character). Neither the keyword nor the text string may contain a null character. The text string is not null-terminated (the length of the chunk defines the ending).

So the homework could be a tool that outputs all textual data in a PNG file.

Even plain text can be not quite trivial if you need to handle arbitrarily long lines, and thus need to resize a buffer dynamically. For example:

Implement tail without parameters. (Lines can be longer than any static buffer. The entire file can be larger than available memory.)

John Deters · Answer 2 · 2013-12-17T23:04:44.563

I see lots of problems when people need to parse character data out of structs, especially when it's a buffer that is normally null terminated, but can be the size of the buffer without a terminator.

short canary = 0x5678;
struct customer {
    char name[4];
    char suffix[3];
};
short fencepost = 0x1234;

Have them fill customer with data like "Joe" and "Jr". Ask them to output all the data in this exact format:

canary: 0x5678
name: Joe
suffix: Jr
fencepost: 0x1234

Then have them fill customer with "John" and "Esq". Their output should look like this:

canary: 0x5678
name: John
suffix: Esq
fencepost: 0x1234

It's a common, simple task. This should teach them they can't just strcpy("Esq") into the buffer, because it'll kill the canary. It should require them to copy the data from the buffer in order to null terminate it to print it.

I recommend you test this out first, and check the compiler settings. In Microsoft's older versions of Visual C++, a debug build would link in a debug version of the memory allocator that added fenceposts around allocated memory. It would enable the debugger to report on an overwritten memory error. I don't know if the newer versions still do that, but I know the release builds don't.

If you want to give them a "why", show how entering a name like Fred@@@@@@@@ can cause a crash, and enable a hacker to take over the program.

score 5 · Answer 3 · answered Dec 16 '13 at 16:08

it's a tough one since afaik modern linux kernels don't have executable stacks, so they won't be able to test their programs on a normal linux distro. (I remeber having trouble testing the code listed in the Alef One paper on my debian machine)

What you could do is provide them with a custom vm (like damn vulnerable linux) which has the security settings disabled and tell them to test their code there.

More to the point a solution would be to make them write a program that gets a file name as a command line argument or as user input

opens the file
reads some configuration from it
parses it/ looks for something specific
pushes the relevant part into a network socket
another process reads the data and must print some relevant parts.

Your scenario could be:

You have to write a program in c which reads airline flight schedules from a file provided by the user and pushes it via a tcp socket to another process which then displays it.

In order to spice things up a bit, you could provide a struct in which everything has to be handled.

It's simple since most of the code can be found online (read from file, write to socket) and the students only have to do the memory handling

*"modern linux kernels don't have executable stacks, so they won't be able to test their programs on a normal linux distro"* - I suspect this represents a misunderstanding of the question, or of buffer overruns. I'm not looking to ask students to exploit a vulnerability in an existing program. Rather, I want to ask them to write secure code -- or code that will hopefully be secure. (Moreover, a non-executable stack is not a magic silver bullet that makes buffer overruns non-exploitable.) — D.W., Dec 18 '13 at 08:58
I fully agree, and I'm sorry for the bad wording of the answer. What I wanted to say is that I guess that students would like to test themselves if/how they can get a shell from their program or generally exploit it and providing them with an easy way to do this would give them a good bonus experience :-). (A professor back in the university did something similar and it turned out that many students went for the bonus) — ndp, Dec 18 '13 at 09:30

score 4 · Answer 4 · answered Dec 23 '13 at 09:22

Depending on your platform constraints, I would suggest having them implement some DCOM.

Implementing DCOM calls involves a lot of different problematic areas - network calls, interface handling, marshalling, distributed reference counting and object lifetime management, shared memory safety, and more - if done right, it also includes some low-level ACL checking and such. Lots of little bits and complex structures flying about.

I have never seen any DCOM code - either calling client or server - NOT have significant bugs (unless implemented in some "safe" language, like VB, and even then it is common).

Tricky code to make memory-safe

4 Answers4