Now when a compiler runs printf("%s", text);, it would look into the stack and go to the address 5054. However what about printf(text)? How does it access the address of 5054 to print "%s" which after printing might result to a segmentation fault?
It doesn't. In the case where the user enters "%s" as the input, there will be two strings at different locations, both containing "%s": the string constant from the second printf
, and the text
char array containing the user's input.
Here's what's happening in more detail:
printf("%s", text);
In this case, the first argument is a string constant. So, the compiler creates a string containing "%s" and adds it to a table of string constants somewhere in memory (let's say at address 5054). When this printf
is called, the address of the string constant (5054) is placed on the stack, along with the address of the text
array. (The order in which they are pushed on the stack depends on the argument push order, but it's usually right to left). Then the current execution address is placed on the stack or into a special register (so the CPU can return from the function) and the CPU jumps to the address of the printf
function. The printf
function evaluates the formatting string, and when it finds the %s it looks on the stack for the second argument (the address of the text
string) and prints it out.
printf(text);
In this example, only the address of the text argument is pushed onto the stack. Then the execution address is saved and the CPU jumps to the address of the printf
function, as before. The address of the string constant (5054) is not placed on the stack, because there is no string constant in the list of arguments to the function.
In this case, the printf
function evaluates the formatting string, and when it finds the %s that the user entered, it looks on the stack for the second argument, but there isn't one, so whatever was on the stack will be read out, treated as the address of a string, de-referenced in order to print it out, and a segmentation fault would likely occur.
printf
is a Variadic function, which means it can take a variable number of arguments. It relies on runtime parsing of the formatting string to determine how many arguments (should) have been passed to it. This is what makes it vulnerable to malicious data: the contents of the formatting string do not match the number of arguments that were actually passed.