27

Why are stored procedures and prepared statements the preferred modern methods for preventing SQL Injection over mysql_real_escape_string() function?

eckes
  • 962
  • 8
  • 19
Damien Pham
  • 287
  • 3
  • 3
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/107110/discussion-on-question-by-damien-pham-why-are-stored-procedures-and-prepared-sta). – Rory Alsop Apr 23 '20 at 19:17

12 Answers12

67

The problem of SQL injection is essentially the mixing of logic and data, where what should be data is treated as logic. Prepared statements and parameterized queries actually solve this issue at the source by completely separating logic from data, so injection becomes nearly impossible. The string escape function doesn't actually solve the problem, but it tries to prevent harm by escaping certain characters. With a string escape function, no matter how good it is, there's always the worry of a possible edge case that could allow for an unescaped character to make it through.

Another important point, as mentioned in @Sebastian's comment, is that it is much easier to be consistent when writing an application using prepared statements. Using prepared statements is architecturally different from the plain old way; instead of string concatenation, you build statements by binding parameters (e.g. PDOStatement::bindParam for PHP), followed by a separate execution statement (PDOStatement::execute). But with mysql_real_escape_string(), in addition to performing your query, you need to remember to call that function first. If your application has 1000 endpoints that perform SQL queries but you forget to call mysql_real_escape_string() on even one of them, or do so improperly, your entire database could be left wide open.

multithr3at3d
  • 12,355
  • 3
  • 29
  • 42
  • 13
    Indeed. Even without worrying about SQLi, prepared queries represent a proper separation of concerns. – Conor Mancone Apr 21 '20 at 09:03
  • 15
    The other point is that all it takes is forgetting to call `mysql_real_escape_string()` once, and your DB is wide open. Using sprocs and prepared statements is usually architecturally an all-or-nothing proposition - either you do it everywhere, or nowhere. – Sebastian Lenartowicz Apr 21 '20 at 13:10
  • 2
    But that is also one reason why you should use a wrapper function for your queries, rather than directly execute queries against the DB. – Kate Apr 21 '20 at 14:57
  • @SebastianLenartowicz A bad developer littering the code with `mysql_real_escape_string` will have the DB wide open if forgetting to call it once. It's absolutely possible and not even complicated to write an abstraction layer which has a similar interface to PDO, using named placeholders for values, and a single `mysql_real_escape_string` called when necessary. – GodsBoss Apr 21 '20 at 20:23
  • 4
    It should be noted that stored procedures per se are not a protection from SQLi, because: 1. while it is less natural in them, they still can build queries by string concatenation inside and then execute them and then they are open to the same kind of problems and 2. you still have to issue the sql statement to call them and that query itself can be vulnerable to injection if you don't use parameter binding. – Jan Hudec Apr 22 '20 at 07:09
  • 9
    @GodsBoss Unless what you're doing is truly trivial, trying to write your own DB abstraction layer is typically an exercise in missed edge cases and "oh, it doesn't already do this; let's hack it in". The standard libraries/third-party ORMs exist for a reason - so you don't have to write your own and worry forever whether you missed a case. – Sebastian Lenartowicz Apr 22 '20 at 12:51
  • 2
    I agree, that's why there is so many famous ORM (like Hibernate or Doctrine for isntance) in various languages : they are using parametrized queries, are often well maintened by people who know what they're doing. In most cases, using well these ORM functions covers 99.99% of the work around SQLi.Ofc, there's always a risk, that everything is not totally sanitized, to be tricked by a young soviet hacker who discovered that using specific character set can be interpreted differently and generate an input alteration (totally random example), or having an old database version which is vulnerable – Alex Apr 22 '20 at 12:59
  • I don't know about PHP but I know in Java a prepared statement can allow a repeated query to only need to be compiled once which can substantial improve runtimes - in many cases it's a micro-optimization but in others it's the difference between a smooth server and a hung server. – corsiKa Apr 22 '20 at 20:33
26

I think the main question here is why string escaping is not as good as the other methods.

The answer is that some edge cases allow injections to slip through even though they are escaped. Stack Overflow has several examples here.

halfer
  • 821
  • 1
  • 7
  • 12
15

While you can stay safe protecting against SQLi escaping user input, it's important to note that it may not always be enough. In this terrible example, quotes are never needed to perform a successful SQL Injection, despite the escape:

<?php
/*
  This is an example of bad practice due to several reasons.
  This code shall never be used in production.
*/

$post_id = mysql_real_escape_string($_GET['id']);

$qry = mysql_query("SELECT title,text FROM posts WHERE id=$post_id");

$row = mysql_fetch_array($qry);

echo '<h1>'.$row['title'].'</h1>';
echo $row['text'];

Now what would happen if one vists .php?id=-1+UNION+ALL+SELECT+1,version()? Let's see how the query goes:

SELECT title,text FROM posts WHERE id=-1 UNION ALL SELECT 1,version()

Sure there are other ways to fix it (i.e. using quotes and escaping or int-casting), but using prepared statements is a way to stay less prone to missing these things and letting the driver to care about how to include user input in the query (even because, although this example looks so easy to be fixed, there are multi-level SQL injections that consists on, basically, saving parts of SQL query to the database knowing that data from the database will be used in the future as part of another query).

Better safe than sorry.

11

Because you're doing the same amount of work for better security

A common trope mentioned against PHP is mysqli_real_escape_string() (look at how long it is! Can't they be consistent with nomenclature?), but most people don't realize that the PHP API is merely calling the MySQL API. And what is it doing?

This function creates a legal SQL string for use in an SQL statement

In other words, when you're using this function, you're asking MySQL to sanitize the value so it is "safe" to use in SQL. You're already asking the database to do work for you in this regard. The whole process looks like this

  1. Escape the untrusted data
  2. Assemble the final SQL query (i.e. put the escaped data into the SQL)
  3. Parse the query
  4. Execute the query

When you use a prepared statement, you're telling your database that it needs to do these in a different order

  1. Parse the query with placeholders
  2. Send the data to fill in the placeholders, specifying the data type as we go along
  3. Execute the query

Because we're sending the data separately, we're doing the same amount of work as escaping, but we get the benefit of not having untrusted data in our SQL. As such, the engine can never confuse provided data with SQL instructions.

There's also a hidden benefit here. What you might not realize is that once step #1 is complete, you can perform steps 2 and 3 over and over, provided they all need to perform the same query, just with different data.

$prep = $mysqli->prepare('INSERT INTO visits_log(visitor_name, visitor_id) VALUES(?, ?)');
$prep->bind_param('si', $visitor_name, $visitor_id); // this function passes by reference
foreach($_POST['visitor_list'] as $visitor_id => $visitor_name) $prep->execute();

This query benefits from only having to loop over the data and send it over and over, rather than adding the overhead of parsing every time.

Machavity
  • 3,766
  • 1
  • 14
  • 29
  • 1
    I consider this a failure of wire protocol design, not keeping code and data separate all the way down. OTOH, using bind parameters, the same client library API could have its backend silently modified to use a new and better future wire protocol that *does* keep things out-of-band rather than trying to perform escaping. – Charles Duffy Apr 21 '20 at 17:47
  • 1
    Nice answer. However, a lot of database drivers implement prepared statements by doing escaping client side. I'm not saying this is a good thing, but you can see it's common in practice if you look at the source code for various Python DBAPI connectors. – paj28 Apr 22 '20 at 06:19
  • There is also the benefit of plan caching in DBs like Oracle for statements executed with different values but same structure (beware bind peeking). – eckes Apr 22 '20 at 19:15
  • 1
    @paj28 PDO (which covers all databases) does do simulated prepare, but mysqli does not – Machavity Apr 22 '20 at 20:24
8

One good reason not to use mysql_real_escape_string: it is deprecated

mysql_real_escape_string

(PHP 4 >= 4.3.0, PHP 5)

mysql_real_escape_string — Escapes special characters in a string for use in an SQL statement

Warning

This extension was deprecated in PHP 5.5.0, and it was removed in PHP 7.0.0. Instead, the MySQLi or PDO_MySQL extension should be used. See also MySQL: choosing an API guide and related FAQ for more information. Alternatives to this function include:

mysqli_real_escape_string()
PDO::quote()

Source: mysql_real_escape_string

The doc also explains that the escaping is dependent on the character set:

The character set must be set either at the server level, or with the API function mysql_set_charset() for it to affect mysql_real_escape_string(). See the concepts section on character sets for more information.

I should also add that stored procedures are not always 'safe', for example when they contain dynamic SQL... they do not protect against poor coding practices. But indeed you should be using parameterized statements, whether you want to use stored procedures is up to you.

Kate
  • 6,967
  • 20
  • 23
  • 2
    "Because it's deprecated" is kinda just circular reasoning, as the OP already asks why it was deprecated... – user1686 Apr 22 '20 at 16:20
  • 1
    This does not answer why prepared statements are preferred. The entire PHP mysql library was deprecated in favor of mysqli or PDO, not just `mysql_real_escape_string`. mysqli and PDO have exactly equivalent functions. They're all calling [the same underlying C function](https://dev.mysql.com/doc/refman/8.0/en/mysql-real-escape-string.html). – Schwern Apr 22 '20 at 18:42
6

Some good answers already, and I going to provide a few further clarifications:

Escaping

mysql_real_escape_string can be used securely. The advice with escaping is that you use the escaping function suitable for your database. You need a slightly different function for each database. Most of the subtle security problems come from people using their own sanitiser (e.g. just ' -> '') and not realising corner cases. You do still need to use mysql_real_escape_string correctly, putting the value inside quotes, but that (the principle, not the quotes) is true for most defences, including prepared statements.

Prepared statements

Prepared statements are a very good way to stop SQL injection. However, you can still get it wrong, I have occasionally seen people dynamically build an SQL statement and construct a prepared statement from that. They were most upset when we told them this was still insecure, as we had told them to use prepared statements! Also, the code you end up with using prepared statements is a little bit unpleasant. Not majorly so, but just slightly aesthetically jarring.

Some database connectors implement prepared statements using escaping internally (I think psychopg for Python/Postgres does this). With other databases (Oracle is one) the protocol has specific support for parameters, so they are kept completely separate from the query.

Stored procedures

Stored procedures do not necessarily stop SQL injection. Some connectors let you invoke them directly like a prepared statement. But they are often called from SQL, where you still need to safely pass data into SQL. It's also possible that a stored procedure has SQL injection internally, so it's dangerous even when called with a prepared statement. I think this advice mostly stems from confusion. Security people are typically not DBAs and are vague on the difference between prepared statements, parameterized queries (another name for prepared statements) and stored procedures (something different). And this advice has persisted as there is some security benefit to stored procedures. They let you enforce business logic at the DB layer, either to allow secure direct access to the DB or as a defence in depth mechanism.

Type safe query syntax

The way I recommend to stop SQL injection is to use a type safe query mechanism. There are many of these, including most ORMs (Hibernate, SQLAlchemy, etc.) as well as some non-ORM libraries/features like LINQ or jOOQ. These protect against SQL injection, provide nice aesthetic code, and also it's harder to use them wrong and end up with SQL injection. Personally, I'm a big fan of ORMs, but not everyone is. For security, the key point is not that you use an ORM, but that you use a type safe query system.

Ok, I hope that clears things up for you. Good question.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • 3
    Another way that prepared statements can go wrong is that there are cases where it is needed but not applicable. For instance, a prepared statement won't allow you to parameterize the name of a table or column. If you have to change the table/column name based on user input then prepared statements won't help you. You have to fall back on properly sanitizing user input (preferably with a whitelist of allowed values) and inserting data directly into the query. It's an easy edge-case to miss when working with prepared statements. – Conor Mancone Apr 21 '20 at 18:25
  • @ConorMancone - Good point. You may find this [SQLAlchemy issue](https://github.com/sqlalchemy/sqlalchemy/issues/4481) interesting. – paj28 Apr 21 '20 at 20:22
  • What do you mean by "the code you end up with using prepared statements is a little bit unpleasant"? – Schwern Apr 23 '20 at 23:07
  • @Schwern - Just that it doesn't look nice, especially with complex queries. Instead of introducing data where it's used, you have a ? as a placeholder and a later line that attaches the data. Less bad with named params. – paj28 Apr 24 '20 at 13:29
3

An additional, non-InfoSec argument, is that parameters potentially allow for faster queries and less memory usage.

Using parameters, the data and query are separate, which especially matters when inserting very large strings and binary fields (the entire string needs to get processed by the escape function (which extends it and might require the entire string to get copied), and then parsed again by the SQL engine, while a parameter just needs to get passed and not parsed).

Some database connectors also allow chunking with parameters, which avoids having to have the entire string in memory at once. Chunking using string concatenation is never an option, since the data is part of the SQL string and thus the SQL string can't be parsed without it.

As there's no benefit for using string concatenation with an escape function over using parameters, and there are multiple for using parameters over string concatenation, it's logical we're pushing towards parameters. This has lead to escape functions becoming deprecated, which introduces potential security issues in the future, which reinforces the need to not use them.

Erik A
  • 259
  • 1
  • 6
1

Both stored procedures and prepared statements limit the potential scope using parameters. call GetUserWithName ('jrmoreno'); offers less opportunity to mess with the query, not no opportunity. Note that parameterized stored procedures via prepared statements offer even less call GetUserWithName (@userName); than plain stored procedures.

Also, Using stored procedures open you up to a different kind of sql injection: dynamic sql inside the stored procedure.

As for why to use prepared statements, prepared statements are a binary protocol where the parameters have fixed types and sizes. When it is processed by the server it is impossible for a number to be anything but a number or for a string to extend beyond it’s boundaries and be treated as more sql commands (as long as that string isn’t used to create dynamic sql inside the sql engine).

Escaping a string just can’t give you the same level of assurance.

jmoreno
  • 496
  • 2
  • 9
  • It should be noted that `call GetUserWithName('$name');` with `$name` obtained from user input is still risky. What if `$name = "jmoreno'); drop database; --"`? – Hagen von Eitzen Apr 23 '20 at 17:55
1

I can have a policy to always use stored procedures and/or prepared statements with whatever database I am writing software for on a given day. My c# code (for example) then looks mutch the same regardless of the database my current employer chooses, hence it is easy to spot when I am not using prepared statements etc.

Prepared statements are infect not important, what is imports never using string operators to create the SQL to send to the database.

mysql_real_escape_string() is a spacial case that is only the option for one database, hence way learn something that may not be useful for the next database I have to code against?

Ian Ringrose
  • 641
  • 1
  • 4
  • 9
1

Stored procedures are superior not only because they are a working defense against some attacks, including injection. They are superior because you get more doing less, and more securely, with a de-facto "proof" of security. Whereas merely escaping strings is far from providing that "proof".

How so?

A query string, even if data is properly escaped, is a piece of text. Text is obnoxious by nature. The text gets parsed which is time consuming and can go wrong in many ways, and then the database engine does something. What does it do? You cannot ever be 100% certain.

You might forget to call the string escape function, or the string escape function might be faulty in a way that can be exploited. Unlikely, but even on a 100% correctly escaped query, it is in principle possible to do something different. Oh sure, there's still access control and such to limit the possible damages, but whatever. The fact remains that in principle, you are throwing a string at the database engine which is not really much different from sending executable code, and then you expect it to do something, which is hopefully exactly what you want.

Stored procedures, on the other hand, encapsulate exactly one specified operation, never anything different. Send what you want, the actual operation that will happen is already defined.
Stored procedures are, in principle, parsed and optimized exactly once, and (possibly, not necessarily, but at least in theory) compiled to a particularly fast representation (e.g. some form of bytecode).

So... you forgot to escape a string? Who cares. The string escape function doesn't work properly? Who cares. The stored procedure can never do anything that you didn't tell it to do, at an earlier time. The worst thing to happen is that someone trying to exploit the server manages to insert some garbage as his username or such. Who cares.

So, you get much better security, and the server needs to do less work (i.e. queries run faster), too. You win on both ends (which is a rare thing to have since usually you have a trade-off).

Plus, from a purely "aesthetic" point of view, not that it matters in practice, things are the way they should be. The database is only ever handed data and it executes some kind of program on that behalf, on its own discretion.

Damon
  • 5,001
  • 1
  • 19
  • 26
0

At a strategic level, the problem is to do with vigilance. When we write code, if we need to do something extra to ensure prevention of a security (or safety, or any other form of quality/performance) problem, then we need to be vigilant in actually doing it.

This is, I believe, the main problem with using an 'escape' function.

What compounds the problem is that, because the 'escape' function is performed outside the query construction/evaluation itself, there is no easy or efficient way of being able to audit all the queries regarding the presence or absence of escape.

To avoid this weakness one has to turn the escape/sql composition into a function -- which is essentially what a prepared statement is, or to use an abstract layer to one's sql altogether (eg. some sort of ORM).

As an SQL monkey, neither options are wonderful. Especially when dealing with complex SQL structures (eg 3-field primary key tables with several left joins, including using the same table twice, aliased: not unusual for tree-based structures).

I'm not talking theoretically: this is practical experience. We started off using escape mechanisms, and then got caught out - fortunately by a pen. test agency - who found a missing escape (the case in hand was the value of a session cookie which hadn't been escaped). They demonstrated it by injecting "select sleep(10);" - a brilliant way of testing for/demonstrating injections.

(Going slightly off-topic it is nigh on impossible for the dev. team to write injection tests for their own code. We tend to see things only from one position. This is why the company always recommends third party, fully independent pen. test agencies.)

So, yes. escape() as a standalone is a very weak way of ensuring injection protection. You always want to build in the protection as a part of the query construction, because it is important that you do not depend upon vigilance - but rather are forced -by default- to ensure that sql injection attacks are accounted for.

Konchog
  • 605
  • 1
  • 5
  • 9
0

You need to understand that mysql_real_escape_string() is a not a meant for protection against SQL injection. It is merely a hack that is meant to reduce the harm and limit the potential attack vectors. Nothing in the name or in the official documentation for mysql_real_escape_string() says this function is to be used to prevent SQL injection.

SQL injection happens when you allow variable data to be part of your SQL string. When building SQL with dynamic data it is difficult to ensure the SQL will not be broken. A rogue input could break the SQL and cause harm to your data, or leak the information.

SQL should be constant, just like the rest of your code. Allowing variable input in SQL is vulnerable to similar attacks as eval() with variable input. You can never be sure what is being executed.

To prevent this you must ensure the SQL is constant and doesn't change with the input. The solution to this are placeholders. You parameterized the data and use placeholders in places where the literals will go. No escaping is needed. Then you prepare such statement and pass the data separately during execution. The SQL is always the same and the data has no way of affecting the SQL and changing its behaviour.

Stored procedures are not a safe way to prevent SQL injections. Similar to escaping it only limits the attack vectors.

Dharman
  • 101
  • 3