80

Raw SQL

When you're writing SQL -- for anything that takes human input really, a lot of things have been done to avoid the injection.

Everyone that's heard of SQL injection knows that (I'm going to use PHP as a sample) doing something like this isn't safe:

$sql = "SELECT * FROM `users` 
.  WHERE `userName` = "{$_POST["username"]}" 
.  AND `pass` = "{$_POST["pass"]}";";

Magic

Then of course, someone came out with the idea of using "magic escape quotes" to deal with program input that wasn't sanitized correctly and directly put into sql as a result of bad practices. This didn't really solve the issue with SQL injection, but it did mean all user input got mangled up.

Adding slashes

So, some people turned off magic quotes. Then, they parsed user input before the point of SQL through addslashes() which in theory escapes all the quotes and your hacker can't do ' OR 1=1, but even the documentation for addslashes it's self say that you shouldn't use addslashes, it says use the database-specific function such as mysql_real_escape_string(), but this is still said to not be enough by some.

Adding slashes specific to Database

So, we can't use DBMS specific *_real_escape_string, we can't use add slashes, the "magic quotes" thing caused lots of issues, and the web is full of short worded quotes such as:

"A dedicated hacker will find a way to jump through your quote-escaping loops, just use the DBAL prepared statements" - John Q any programmer

Okay, so that scared me enough to use prepare statements and a DBAL. It didn't really explain anything, but it sounds good because I've heard it a lot.

Prepared statements

So now we're using PDO, or a DBAL from a framework, or something else that wraps all our sql and makes sure someone can't run an sql injection.

My question is basically a "why not?", not a "what should I use?". The web's full of people telling you to use this or use that or whatever, but no explanations of why these things had to happen.

Direct questions

Pointed questions (reminder, I'm asking about SQL, PHP was an example language because of it's bad rep around SQL, concepts are universal):

  1. Why can't we escape all user input using "magic" ?
  2. Why wasn't addslashes "good enough"?
  3. Whats wrong with using DB-specific escape functions, and why was it better than addslashes?
  4. Why are prepared statements with frameworks and PDO being hailed as the gold standard of SQL? Why are they better? Why can't I do an SQL injection with these, where as I COULD have with the previously mentioned means? Can a programmer not somehow manage to still screw this up? What should they look out for?
  5. Any other concerns I haven't brought up?
Simon
  • 3,182
  • 4
  • 26
  • 38
Incognito
  • 5,204
  • 5
  • 27
  • 31

7 Answers7

54

Why can't we escape all user input using "magic"?

At the time the magic is applied, it is unknown where the data will end up. So magic quotes are destroying data that, unless it is written unescaped to a database.

It may just be used in the HTML response sent back to the client. Think of a form that has not beem filled in completely and is therefore shown again to the user. With magic quotes, the data entered on the first attempt will now be SQL escaped, which is meaningless on an HTML page. Even worse: on the second submission the data is SQL escaped again.

Why wasn't addslashes "good enough"?

It has issues with multibyte characters:

' 27
\ 5c
뼧 bf 5c

Those are two bytes, but only one Unicode character.

Since addslashes does not know anything about Unicode, it converts the input bf 27 to bf 5c 27. If this is read by a Unicode-aware program, it is seen as 뼧'. Boom.

There is a good explanation of this issue at http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string

Whats wrong with using DB-specific escape functions, and why was it better than addslashes?

They are better because they ensure that the escaping interprets the data in the same way the database does (see the last question).

From a security point of view, they are okay if you use them for every single database input. But they have the risk that you may forget either of them somewhere.

Edit: As getahobby added: Or that you use xxx_escape_strings for numbers without adding quotation marks around them in the SQL statement and without ensuring that they are actual numbers by casting or converting the input to the appropriate data type /Edit

From a software development perspective they are bad because they make it a lot harder to add support for other SQL database server software.

Why are prepared statements with frameworks and PDO being hailed as the gold standard of SQL? Why are they better?

PDO is mostly a good thing for software design reasons. It makes it a lot easier to support other database server software. It has an object orientated interface which abstracts many of the little database specific incompatibilities.

Why can't I do an SQL injection with these [prepared statements], where as I COULD have with the previously mentioned means?

The "constant query with variable parameters" part of prepared statements is what is important here. The database driver will escape all the parameters automatically without the developer having to think about it.

Parametrized queries are often easier to read that normal queries with escaped parameters for syntactic reasons. Depending on the environment, they may be a little faster.

Always using prepared statements with parameters is something that can be validated by static code analysis tools. A missing call to xxx_escape_string is not spotted that easily and reliably.

Can a programmer not somehow manage to still screw this up? What should they look out for?

"Prepared statements" imply that the are constant. Dynamically generating prepared statements - especially with user input - still have all the injection issues.

Nathan Osman
  • 282
  • 1
  • 10
Hendrik Brummermann
  • 27,118
  • 6
  • 79
  • 121
  • This in very in depth and not realistic. 99.9% of web applications are not affected by language encoding problems. ~.1% of sql injection is found with static code analysis. There are far better techniques employed by professionals. From experience i can tell you that the issues I covered are **far more common**. – rook May 07 '11 at 20:53
  • 7
    I disagree @Rook, Hendrik's answer is well realistic. The PDO mantra is mostly unnecessary since many web applications are built to use a specific DB until their End Of Life. Hence the developer might as well have used the relevant functions instead of over-complicating an application to use 2% of the features of an extension. – Christian May 08 '11 at 07:55
  • There are only 4 language sets that have problems with `addslashes()`, gbk being one of them. – rook May 08 '11 at 20:32
  • 4
    @Rook - once again, you are resorting to insults rather than answering questions. This is not acceptable. You need to either sort out your communication style and behaviour, or we will have to resort to suspension. Please take this as a polite warning. – Rory Alsop Sep 29 '11 at 20:44
  • Regarding the latest question "Can a programmer not somehow manage to still screw this up?", I just wanted to link to a [recent SQL injection issue](http://drupal.stackexchange.com/questions/133795/what-kind-of-attacks-does-the-patch-for-sa-core-2014-005-drupal-7-32-prevent) that affected the Drupal CMS content, bypassing PDO protection. – WhiteWinterWolf Dec 18 '14 at 16:05
13

[1.] Why can't we escape all user input using "magic" ?

Because magic quotes are broken on two counts:

  1. Not all $_(REQUEST/GET/POST) data goes into the DB. And for you to make any sense of it, you have to de-escape the input.
  2. They're the addslashes equivalent (see my note on [2.]), hence they're not actually secure.

[2.] Why wasn't addslashes "good enough"?

There are many ways to get out of addslashes restriction, therefore, it's fundamanetally flawed, however you try to use it.

[3.] Whats wrong with using DB-specific escape functions, and why was it better than addslashes?

Nothing really. That is simply the mantra on why PDO/prepared are "better". They're way better than addslashes since they take care of many factors, including encoding. Here's a little secret; PDO uses them internally, but not addslashes...

[4.] Why are prepared statements with frameworks and PDO being hailed as the gold standard of SQL? Why are they better? Why can't I do an SQL injection with these, where as I COULD have with the previously mentioned means?

They're not the gold standard, they're not "more" secure than DB-specific functions and you can do SQL injection with these as well. And no, you can't do SQL injection with properly-sanitized input.

As with all technologies out there, developers tend to develop religious(fanatical?) tendencies for anything "new". That is why you get seasoned Zend Certified Engineers(TM) advising -no- forcing you to switch to prepared statements.

The reality, though, is that it all depends on knowing what you're doing. If you're loading an article by its numeric ID, you don't need any kind of escaping, even less PDO. You simply need to ensure that ID is indeed an integer.

From a more realistic perspective, the general rule is not to use PDO/prepared statements, but to use the right functions, which is to say, you must use mysql_real_escape_string() instead of addslashes() or mysql_escape_string().

[5.] Can a programmer not somehow manage to still screw this up? What should they look out for?

Of course! But it isn't "what to look out for" but rather "know what you're doing".

You see, eval($_GET['code'])* is entirely legit when used in the right place (such as a local PHP Test Runner).

If you use an ORM to run a conditional query, you're entering code directly, hence there may be a chance of getting SQL injection if you don't do it right (depends on the ORM in question). (Hypothetical) Example:

// update() sets the value of a column given a condition
//
//                         .--escaping is automatic here
//                         |                           .--hypothetical SQL injection
//                         v                           v
Db()->update('name',$_GET['newname'])->where(' id = '.$_GET['id']);

(* I can only imagine the countless professionals banging their head over this)

Christian
  • 343
  • 1
  • 9
  • 1
    +1. especially for pointing out the pitfall of APIs mixing parametrized queries with dynamic parts. – Hendrik Brummermann May 08 '11 at 13:19
  • A useful technique to store web requests or encoded data is to encode it to base64 and store it as a string. It doesn't prevent SQL injection or bad data, but it at least makes sure nothing that was input will be used as is. – Drunken Code Monkey Jul 05 '16 at 22:53
9

At the end of the day escaping of input must be done in order to protect your queries form SQL Injection. Parametrized query libraries like PDO and ADODB use escaping, all they do is enforce this practice in a way that is implicitly secure. By constrast magic_quotes_gpc does NOT, this is fundamentally flawed approach to the problem.

1) Escaping can be problematic if you don't understand what your doing. This type of issue is still exploitable if magic_quotes_gpc is enabled.

mysql_query("select * from users where id=".addslashes($_GET[id]));

PoC Exploit: http://localhost/vuln.php?id=sleep(30)

patch:

mysql_query("select * from users where id='".addslashes($_GET[id])."'");

Lesson: You don't need quote marks to exploit sql injection.

2) Lets say you only escape quote marks:

mysql_query("select * from users where name='".htmlspecialchars($_GET[name],ENT_QUOTES)."' and id='".htmlspecialchars($_GET[id],ENT_QUOTES)."'");

PoC Exploit: http://localhost/vuln.php?name=\&id=+or+sleep(30)/*

Patch:

mysql_query("select * from users where name='".addslashes($_GET[name])."' and id='".addslashes($_GET[id])."'");

Lesson: Backslashes are equally as dangerous as quote marks.

3) What about language encoding?

mysql_query("SET CHARACTER SET 'gbk'");
mysql_query("select * from user where name='".addslashes($_GET[name])."'");

PoC Exploit:http://localhost/vuln.php?name=%bf%27=sleep(30)/*

Patch:

mysql_query("SET CHARACTER SET 'gbk'");
mysql_query("select * from user where name='".mysql_real_escape_string($_GET[name])."'");

Conclusion:

Escaping is absolutely required in order to prevent sql injection in an application. In PHP PDO and ADOB are great parametrized query libraries that enforce the escaping of user input. The problem is that many programmers don't understand what escaping is. Having a library to enforce this secuirty policy is the best method of defense, because you don't need to understand escaping rules.

rook
  • 46,916
  • 10
  • 92
  • 181
  • 3
    "Escaping is absolutely required in order to prevent sql injection in an application", this is simply wrong. A far less error prone approach than using escaping is to use parametrized queries with a constant query part and all the untrusted data in parameters. Since the data is never part of the SQL query string, no escaping is needed and the database just does the right thing. /// Please do not suggest to use addslashes because a PHP application can be vulnerable to the charset attack even without explicitly setting the charset because of MySQL default configurations for example. – Hendrik Brummermann May 07 '11 at 20:41
  • To take the mysqli driver as example: The function [mysqlnd_stmt_execute_store_params](http://codesearch.google.com/codesearch/p?hl=de#ceArhxvuL0U/Oem/php/ext/mysqlnd/mysqlnd_ps_codec.c&q=mysqlnd_stmt_execute_store_params&sa=N&cd=1&ct=rc) is sending the parameters to the server as binary data outside the query string. – Hendrik Brummermann May 07 '11 at 21:43
  • @Hendrik Brummermann What top level parametrized query function (http://php.net/manual/en/book.mysql.php) is being supported by this function? I think you are mistaken about the functionality of this code. I know its a misnomer to say "escaping is bad and that you should use PDO instead". – rook May 07 '11 at 22:19
  • 2
    The associated set of high level query functions that end up calling that internal driver function are [prepare](http://www.php.net/manual/en/mysqli.prepare.php), [bind_param](http://www.php.net/manual/en/mysqli-stmt.bind-param.php) and [execute](http://www.php.net/manual/en/mysqli-stmt.execute.php). I use the escaping way myself in most cases, but using parametrized queries is a valid alternative. And while there are some frameworks that substitute the parameters into the SQL statement (with proper escaping), the database functions transmit them along side the query. – Hendrik Brummermann May 07 '11 at 22:31
  • @Hendrik Brummermann I am very skeptical of this. Mainly becuase the [mysqli code](http://codesearch.google.com/codesearch/p?hl=de#ceArhxvuL0U/Oem/php/ext/mysqli/mysqli.c&q=mysqlnd_stmt_execute_store_params&d=3) doesn't even [call that function](http://codesearch.google.com/codesearch?q=mysqlnd_stmt_execute_store_params&exact_package=http%3A%2F%2Fsvn.osgeo.org%2Fmapguide%2Ftrunk%2FMgDev&hl=de) – rook May 08 '11 at 00:33
  • 1
    "Having a library to enforce this secuirty policy is the best method of defense, because you don't need to understand escaping rules" - And that, my friends, is why we fail in security. The truth is the programmer needs to know what s/he's doing. **Not knowing is a security issue in itself.** Using PDO etc is like using OSX, you know you're secure until there's enough of you to make hackers rethink their strategy - and when that virus goes out, it's Windows 98 all over again. – Christian May 08 '11 at 07:52
  • @Christian Sciberras I agree. Unfortunately the people that really understand these issues are very uncommon and are there for very expensive. – rook May 08 '11 at 20:33
  • @Rock, your google search within that file has no hit because the call is nested across different files: mysqlnd_stmt_execute_store_params in mysql_ps_codec.c is called by [mysqlnd_stmt_execute_generate_request](http://goo.gl/QlQHk) in mysql_ps.c which is called by the exported method [mysqlnd_stmt_execute](http://goo.gl/c9GY4). [mysqli_mysqlnd.h](http://goo.gl/9e42S) and [mysqlnd_libmysql_compat.h](http://goo.gl/OnUa7) make the mapping from the mysql functions. – Hendrik Brummermann May 08 '11 at 22:21
  • @Hendrik Brummermann And the mysqli code doesn't use any of that noise. If you look at the mysqli code its building its own MY_STMT struct where by the individual parameters *can be* transmitted separately from the raw query. So you are partially correct, there is another option than escaping for mysql, but most parametrized query libraries don't use it. Further more there is no difference in safety between these two approaches, however the MY_STMT struct will use a little less bandwidth/cpu time. – rook May 08 '11 at 22:37
5

About the DB-specific escape functions. There is many scenarios under which SQL injection may occur, even though mysql_real_escape_string() has been used. LIMIT and ORDER BY are really tricky!

Those examples are vulnerable to SQL injection :

$offset = isset($_GET['o']) ? $_GET['o'] : 0;
$offset = mysql_real_escape_string($offset);
$sql = "SELECT userid, username FROM sql_injection_test LIMIT $offset, 10";

or

$order = isset($_GET['o']) ? $_GET['o'] : 'userid';
$order = mysql_real_escape_string($order);
$sql = "SELECT userid, username FROM sql_injection_test ORDER BY `$order`";

In both case you can't use ' to protect the encapsulation.

source : The Unexpected SQL Injection (When Escaping Is Not Enough) <-- Read this

Cut Copy
  • 109
  • 1
  • 5
4

Filters can be bypassed and data that goes into SQL statements can come from more places than just user input. The entry points into a SQL influenced source/sink can be stored (higher-order), as one example, but clearly HTTP GET parameters and POST parameters from HTML forms are not the only ways of user input.

Parameterized queries do not necessarily make your SQL statements free from injections. They also require variable binding, removal of string operations/concatenation, and the avoidance of certain security issues with a large range of SQL clauses.

Most developers also forget to check their third-party components and other dependencies for SQL injection issues, and sometimes don't realize that these inclusions could make their own code vulnerable.

The best thing to do is to hire an application security consulting company to prepare your application(s) for production readiness, and to train your appdev team(s) on building application security controls into their process and development technology.

atdre
  • 18,885
  • 6
  • 58
  • 107
3

To expand on Hendrick's answer. It is easy to think of real_escape_string as a magic bullet when in reality it is not. I have seen people use that function when they want to use intval or cast the input to an integer instead. You can still get injected if you use the escape string in the wrong context.

getahobby
  • 175
  • 3
-3

Since the point is to help you to learn how to write decent code and not just copy a recipe I will leave some hints.

  • If we leave encoding stuff, there are many ways to inject sql and not all of them require you to add " or ' or ‘, therefore Addslashes is basically a lame solution used by lame and unskilled developers.... (The ones you should not be reading code from).

  • This one doesn't happen a lot but, there is actually a vector where the attacker can exploit a prepared statement, in some db engines. It's not common but you can remember that prepared statement uses a statement but with table metatdata column type,size....

  • Other usual lame error is handling searches in queries with likes..... If you find out how it works you can do some intersting things....

Do your research, never trust an input, never use anything other than a prepared statement, ignore anyone that tells you to use escape fu...

And you should be safe, remember there is no point in getting anything other than what you are expecting, so strong typed api is your friend ;).

Rory Alsop
  • 61,367
  • 12
  • 115
  • 320
xlinex
  • 1
  • 3
    you need to clean this answer - spelling, grammar, odd formatting - please put a little more time into your answers – schroeder Oct 08 '14 at 23:09