2

I have a question regarding to this unsafe C program.

int main(int argc, char **argv)
{
    char text[32];
    static int some_value = -72;
    strcpy(text, argv[1]); /* copy the parameter into the array "text" */
    printf("This is how you print correctly:\n");
    printf("%s", text);
    printf("This is how not to print:\n");
    printf(text);
    printf("some_value @ 0x%08x = %d [0x%08x]", &some_value, some_value, some_value);
    return(0);
}

From this code we can see that printf(text) is potentially unsafe as a malicious user can input a malicious string into it. Take for instance a user who enters %s as his input which will result in %s being stored in another memory location which would be defined in the stack. For instance, lets assume %s is stored in memory address 5054 in which the number 5054 would be stored on the stack to allow the processor to know where %s is located at.

Now when a compiler runs printf("%s", text), it would look into the stack and go to the address 5054. However what about printf(text)? How does it access the address of 5054 to print %s which after printing might result to a segmentation fault?

Anders
  • 64,406
  • 24
  • 178
  • 215
weejing
  • 161
  • 2
  • 6
  • 1
    Just want to mention that one should consider (if your viewpoint is security) only to use C if you really have to. Even if you are really good at it. http://security.stackexchange.com/questions/55723/are-there-secure-languages – Simply G. Apr 21 '16 at 06:47
  • This [StackOverflow question](http://stackoverflow.com/questions/7459630/how-can-a-format-string-vulnerability-be-exploited) has some more details about your example. – Sjoerd Apr 21 '16 at 10:15

1 Answers1

4

Now when a compiler runs printf("%s", text);, it would look into the stack and go to the address 5054. However what about printf(text)? How does it access the address of 5054 to print "%s" which after printing might result to a segmentation fault?

It doesn't. In the case where the user enters "%s" as the input, there will be two strings at different locations, both containing "%s": the string constant from the second printf, and the text char array containing the user's input.

Here's what's happening in more detail:

printf("%s", text);

In this case, the first argument is a string constant. So, the compiler creates a string containing "%s" and adds it to a table of string constants somewhere in memory (let's say at address 5054). When this printf is called, the address of the string constant (5054) is placed on the stack, along with the address of the text array. (The order in which they are pushed on the stack depends on the argument push order, but it's usually right to left). Then the current execution address is placed on the stack or into a special register (so the CPU can return from the function) and the CPU jumps to the address of the printf function. The printf function evaluates the formatting string, and when it finds the %s it looks on the stack for the second argument (the address of the text string) and prints it out.

printf(text);

In this example, only the address of the text argument is pushed onto the stack. Then the execution address is saved and the CPU jumps to the address of the printf function, as before. The address of the string constant (5054) is not placed on the stack, because there is no string constant in the list of arguments to the function.

In this case, the printf function evaluates the formatting string, and when it finds the %s that the user entered, it looks on the stack for the second argument, but there isn't one, so whatever was on the stack will be read out, treated as the address of a string, de-referenced in order to print it out, and a segmentation fault would likely occur.

printf is a Variadic function, which means it can take a variable number of arguments. It relies on runtime parsing of the formatting string to determine how many arguments (should) have been passed to it. This is what makes it vulnerable to malicious data: the contents of the formatting string do not match the number of arguments that were actually passed.

techraf
  • 9,141
  • 11
  • 44
  • 62
samgak
  • 2,058
  • 1
  • 8
  • 11