Could someone please point me to a link with some information on multibyte character exploits for MySQL? A friend brought them to my attention, but I've not been able to find much information on the Internet.
2 Answers
Summary. Yes, the issue is that, in some character encodings (like UTF-8), a single character is represented as multiple bytes. One way that some programmers try prevent SQL injection is to escape all single quotes in untrusted input, before inserting it into their SQL query. However, many standard quote-escaping functions are ignorant of the character encoding that the database will use and process their input as a sequence of bytes, oblivious to the fact that a single character might fill up several bytes. This means that the quote-escaping function is interpreting the string differently than the database will. As a result, there are some cases where the quote-escaping function might fail to escape portions of the string that the database will interpret as a multi-byte encoding of a single quote; or might inadvertently break up a multi-byte character encoding in a way that introduces a single quote where one was not previously present. Thus, multi-byte character exploits give attackers a way to do SQL injection attacks even when the programmer thought they were adequately escaping their inputs to the database.
The impact. If you use prepared/parametrized statements to form all database connections, you are safe. Multi-byte attacks will fail. (Barring bugs in the database and the library, of course. But empirically, those seem to be rare.)
However, if you try to escape untrusted inputs and then form a SQL query dynamically using string concatenation, you may be vulnerable to multi-byte attacks. Whether you are in fact vulnerable depends upon specific details of the escaping function you use, the database you use, the character encoding that you're using with the database, and possibly other factors. It can be hard to predict whether multi-byte attacks will succeed. As a result, forming SQL queries using string concatenation is fragile and not recommended.
Technical details. If you'd like to read about the details of the attacks, I can provide you with a number of links that explain the attacks in great detail. There are several attacks:
Basic attacks on, e.g., UTF-8 and other character encodings by eating up extra backslashes/quotes introduced by the quoting function: see, e.g., here.
Sneaky attacks on, e.g., GBK, that work by tricking the quoting function to introduce an extra quote for you: see, e.g., Chris Shiflett's blog, here, or here.
Attacks on, e.g., UTF-8, that conceal the presence of a quote by using an invalid non-canonical (over-long) encoding of the single quote: see, e.g., here. Basically, the normal way of encoding a single quote has it fit into a single-byte sequence (namely,
0x27
). However, there are also multi-byte sequences that the database might decode as a single quote, and that do not contain the0x27
byte or any other suspicious byte value. As a result, standard quote-escaping functions may fail to escape those quotes.
-
1+1 great discussion. But i like referring to this phenomenon as "byte consumption". I posted another example of this that you might enjoy. – rook Jan 03 '12 at 17:04
-
1@Rook: It doesn't have to be 'consumption', the attack can also *introduce* quotes, see example 2. – tdammers Jan 04 '12 at 06:49
-
@tdammers I am sure this is just a simple misunderstanding. Mutlibyte attacks must consume a character(s), even if the character that being consumed is an escape character. (I didn't invent this term, I just think its appropriate.) – rook Jan 04 '12 at 08:06
Mutli-byte attacks are not limited to SQL Injection. In a general sense multi-byte attacks lead to a "byte consumption" condition in which the attacker is removing control characters. This is the opposite of the classic ' or 1=1--
, in which the attacker is introducing the single-quote control character. For mysql there is mysql_real_escape_string()
which is designed to take care of character encoding problems. Parametrized query libraries like PDO will automatically use this function. MySQLi actually sends the parameters of the query as a separate element within a struct, which avoids the problem entirely.
If an HTML page is rendered via Shift-JIS then it is possible to consume control characters to obtain XSS. An excellent example of this was provided in "A Tangled Web" (fantastic book!) on page 207:
<img src="http://fuzzybunnies.com/[0xE0]">
...this is still a part of the mkarup...
...but the srever dosn't know...
" onload="alert('this will execute!')"
<div>
...page content continues...
</div>
In this case the 0xE0 is a special byte that signifies start of a 3 byte symbol. When the browser renders this html the flowing ">
will be consumed and turned into a single Shift-JIS symbol. If the attacker controls the following input by means of another variable then he can introduce an event handler to obtain code execution.