I'm no techie and would like your expertise in understanding this. I recently read a detailed article on SQLi for a research paper.
It strikes me as odd. Why do so many data breaches still happen through SQL injection? Is there no fix?
I'm no techie and would like your expertise in understanding this. I recently read a detailed article on SQLi for a research paper.
It strikes me as odd. Why do so many data breaches still happen through SQL injection? Is there no fix?
There is no general fix for SQLi because there is no fix for human stupidity. There are established techniques which are easy to use and which fix the problems (especially parameter binding) but one still has to use these techniques. And many developers are simply not aware of security problems. Most care that the application works at all and don't care much about security, especially if it makes things (even slightly) more complex and comes with additional costs like testing.
This kind of problem is not restricted to SQLi but you'll find it with buffer overflows, certificate checking, XSS, CSRF... . It is more expensive to do secure programming because of the additional testing and of the additional expertise needed by the (thus more expensive) developer. And as long as the market prefers it cheap and does not care much about security you get the cheap and insecure solutions. And while security by design helps a lot to make it better developers often work around this design because they don't understand it and it is just in their way.
Because it's not a problem.
When was the last time a company with a SQL injection vulnerability got hauled up in court, and slapped with a big fine for being reckless with user data, and the directors' warned, fined or locked up for negligence?
When was the last time a company lost a big contract because their company website login page didn't validate passwords properly?
When was the last time a qualified regulator/auditor from a professional organisation had to approve and sign off a public facing computer system before it could be put into use?
You would think that "people will die" would be a good enough reason to make buildings with fireproof materials, alarms and good escape routes. It wasn't. We introduced regulation to force non-flammable building materials, fire safe designs with fire breaks, fire alarms.
You might think "people will die" would be a good enough reason to make everyone care about building structural design. It isn't. It just isn't. We have to have regulation to have qualified engineers sign off on building designs, that they be designed and built for specific uses, and when things fail, society takes the engineers to court.
You would think that "people will die" would be a good enough reason to make food processing clean and safe, but it wasn't.
SQL Injection is less obvious, less publicly visible, and has less severity impact, and is in a completely unregulated industry.
Even to companies which do care about it, they can't usefully advertise "No known SQL injection vulnerabilities in our code" as a marketing bullet point anyone cares about. It's not the sort of question customers ask salespeople. It's not a competitive advantage for them, it's a cost, an overhead. Protecting against it makes them less competitive, slower moving, doing more work for the same functionality.
The incentives are all aligned for it to keep existing. So it keeps existing.
Make SQL injection a problem for companies, and they will make it go away.
[Edit: But there's an EU regulation that websites have to warn you if they use cookies. And they do. So regulating public facing computer systems to make them more annoying can come into effect - even if the current regulation is pretty useless.]
SQL injection is still around because the software world still doesn't understand that programmatic generation of tree-structured values (like queries or markup) should be done by constructing syntax trees as first-class objects, not by concatenating strings that represent fragments of a language.
There has been a bit of progress in recent years with the increasing availability of query builder tools like LINQ to SQL or SQLAlchemy, but that's on the programming language side. Relational databases still don't offer a standard, compelling alternative interface that's not fundamentally based on sending queries as strings.
Prepared statements with query parameters are barely an improvement, because they're only easy to use if the structure of your queries—which tables are joined, what filtering conditions, what columns to project—is fixed. When you have an application that needs to construct query text at runtime, prepared query parameters are a big pain to use.
So if a standardized, non-textual, tree-structured protocol could be constructed for describing and communicating queries to the database, and it was designed to be easier to use than textual queries, then that would solve the problem in the long term. But the problem won't go away until the industry adopts something where the path of least resistance is safe. As long as we insist on unsafe-by-default systems where writing safe code takes unnecessary effort, problems will be with us. (Think of all the buffer overflows that don't exist at all in memory-managed languages!)
Note that the same fundamental problem as SQL injection plagues the Web, under the name of cross-site scripting—which is really just Javascript injection into dynamic HTML pages. A very common pattern is Javascript programmers who, instead of working with the DOM by treating it as a tree of objects, resort to the innerHTML
property to set it to HTML text that's built by naïve string concatenation. A lot of XSS vulnerabilities would never have existed if the innerHTML
property had never been put into the browsers' DOM implementations.
Also, for folks who haven't seen Tony Hoare's talk on null pointers, it's simultaneously orthogonal (null pointers, not SQL injection) but at the same time incredibly relevant:
When testing, it is very easy to test for what you expect to happen. For example, when filling in a "name" field in a database you will probably choose something you are familiar with, like "John Doe". This works, and your application seems to work fine.
Then, one day, someone names their child Robert'); DROP TABLE Students; --
(little Bobby Tables).
Of course, you don't test your app for names like that, so the security hole that such a name exposes slips through all your testing.
There is an interesting comment here: The Unfalsifiability of Security Claims
It's easy to prove a security hole exists (you just test for a name like the above). It's also easy to prove that a particular hole has been fixed. It's hard to prove no other security holes exist.
Steffen makes good points in his answer, but I'd like to add to it. The why, I think, can be broken in to the following topics:
So let's break those down.
There's a lot of emphasis on user education these days. Teach users how to maintain strong passwords. Teach users how to identify phishing. Teach users how to .... You get the idea. Some enterprises, a lot probably, but I can only speak to my professional experience and I haven't worked at a lot of enterprises ;), do have training programs. But those training programs can be incomplete or not reach the depth of knowledge needed. That's not to disparage the hard work that goes in to building those programs. But to say that just like in school environment, different people learn differently. And unless you have a continued education program for developers, it's going to be hard to communicate "use parameterized queries, and here's how to do it in PHP, Java, Python, Ruby, Scala, NodeJS, ...". It's hard work developing, delivering, and maintaining developer programs that effectively reach the audience.
Above, one of the things that I alluded to was reaching the audience effectively for different learning types. One of the reasons for that is a lot of enterprises have a high churn rate for developers, because the developers are contractors that get shifted from project to project at different companies. And companies are not always at the same security maturity. One company may not have a security program at all, while another may have an excellent security program and the developer is suddenly bombarded with new information that'll be required of them for all of six months before they move to another company. It's sad, but it happens.
Project delivery on schedule, or even ahead of schedule. The quickest path to completing the project usually, sadly, isn't completing the project with security controls. It's getting it done in the most broken way that still works. We know that it'll cause more work, more time, and more money later when it comes time to maintain the project and fix problems, but management just wants the project out.
Another item I touched on is developing security training programs for a myriad of programming languages. A lot of enterprises don't have one or two set languages. So developers like to (or are encouraged) try out the new hotness. That includes languages and frameworks. This means security programs must continually evolve.
And here we are at management. Every time, it seems like in a public breach, there were controls that could have been implemented, that aren't that hard, but were missed. Pushes to deliver products first and worry second always, despite lesson after lesson after lesson, comes back on product companies. Management must push from the top to take the time to build in security at the beginning. They must understand that more work, more time, and more money will be spent fixing problems, maintaining the product, and paying fines. But cost-benefit analyses point to the problem being product delivery, not the fines or maintenance work required. Those equations must change, and that comes, in part, to education (wooo, full circle) at the MBA level. Business managers must be taught that to be successful in a landscape of ever-increasing breaches, security must be front and center.
The why, despite SQLi being nearly 20 years old, is fraught with several reasons. As security practitioners, we can only do so much to educate and raise awareness of what happens when security is not considered as an integral part of the SDLC.
I agree with a lot of the answers, but one very important point isn't made: code doesn't magically fix itself, and there is a lot of code out there which is 17 years old. I have seen many companies write clean and safe new code, whilst the application could still be attacked in some of it's older sections. And worst of all: fixing old code is expensive, because it requires developers to delve into code that was written in a different era with different coding styles and different technologies. Sometimes fixing old code to not cause SQL injections requires one to recreate entire libraries that were bought back in the day (this is something I had to do a couple of years ago).
Not to say that all new code is free of SQL injections, but I personally haven't seen any professionally written new code in the past 4 or 5 years which contained them. (The only exception being where developers have to do a quick and dirty fix in old code and use the same style/technology in which the rest of the code is written.)
I believe it's because many developers learn just enough to get the job done, for some value of "done". They learn how to build SQL code, often from outdated online tutorials, and then when the code "works" to the extent that they can say "I can put stuff in the database, and I can generate the page of results", then they're satisfied.
Consider this guy on the ladder:
Why's he doing that? Why doesn't he have proper scaffolding? Because he's getting the job done. Put the ladder up against the wall over the stairs, and it works just fine. Until it doesn't.
Same thing with INSERT INTO users VALUES($_POST['user'])
. It works just fine. Until it doesn't.
The other thing is that they're not aware of the dangers. With the guy on the unstable ladder, we understand gravity and falling. With building SQL statements from unvalidated outside data, they don't know about what can be done.
I spoke to a web developer user group last month, and of the 15 devs in the audience, two had heard of SQL injection.
I think the main reason is that developer training doesn't start with best practices, it starts with language understanding. Thus, new programmers, believing they have been trained with the tools to create something proceed to create the queries the way they've been taught. The next and most dangerous step, is to allow someone to develop anything without review and therefore continued opportunity to make more poor choices without knowing that there is something wrong with it and produce further habits that ignore industry-wide accepted best practices. So, to sum it up - poorly trained programmers operating in an environment that does not value anything but the end product.
It has nothing to do with intelligence or "human stupidity". There is a systematic approach that has been well defined over the years and it is negligent for anyone who produces software to ignore that process in the name of faster or cheaper implementation. Perhaps some day the legal ramifications of this behavior will enable us to have more controls in place like the medical or construction industries where failure to comply with these rules and accepted practices will result in a loss of license or other penalty.
Why did SQL injection vulnerabilities not got extinct yet? Metaphorically speaking, for the same reason that car crashes are still around since the very first car in 1895 and even the most innovating and modern self-driving cars today, Tesla model S (on autopilot) or Google self-driving car crash from time to time.
The cars are created (and controlled) by humans, humans make mistakes.
Web sites and (web) applications are built by human programmers. They tend to make poor mistakes in the security design or tend to break things with "quick-dirty-fixes" when something was secure but actually introducing a new vulnerability for example because time/budget for developing a fix was limited, or the developer had a great hangover when he wrote the fix.
Is it always caused by developers? Essentially yes, but not always by the first-line developer. For example, a local supermarket asked a web development company to create a website for them. The developers rent some shared hosting space from a hosting company to host the site on and they install WordPress and some useful plugins.
Now the developers of the web development company don't necessarily have to make a mistake to introduce a SQL injection vulnerability in order to be vulnerable. What could go wrong here? A few examples:
Now the question that is raised, who is responsible? The supermarket, the web development company, the hosting company, the WordPress community, the WordPress plugin developers or the attacker who misused the vulnerability, rhetorically speaking? - This isn't a statement, it's exaggerated and just some questions that are likely to be asked in case something goes wrong.
Often the above discussion (questions about responsibility, although slightly exaggerated) are also a risk factor since some developers tend to have a "that's not my code"-attitude. You can imagine how complicated that makes the situation sometimes.
Firstly no one writes secure requirements properly, they say something like "The product shall be secure" Which in no way is testable
Secondly Profession developers are not stupid, and to say so is rather disingenuous, they are all likely to have university degrees, and have been solving problems we haven`t even begun to look out... The problem is that they have never been taught what it is to develop software securely. This starts out at schools, then university and then whatever job they take, where any training is "on-the-job' because software firms are too scared to train developers in case they leave.
Developers are also under increasing pressure to do more work in less time, they are busy fixing one issue and moving on to the next, there is little time to reflect as the next problem comes along.
Developers are not incentivised to test beyond what they are developing, if they find an issue, they are likely to be the developer to fix it. The developer mantra here is "Do not test what you are not prepared to fix"
Thirdly testers are also not trained to find security vulnerabilities, for much the same reason as software developers. In fact a great deal of testing (in my opinion) is simply repeating the testing that the development team.
Forthly, time to market is a huge factor, if you are out there first you are making money, developing securely is seen as having a big impact on speed of development - I mean really, who has time for a threat model! ;)
Finally it's not just SQL injections, buffer overflows have been known about since the 1960's and you can still stumble over them with alarming regularity.
Yes, anthropologically, humans are stupid.
Yes, politically, the incentive structure does not sufficiently penalize vulnerable applications
Yes, the process is flawed-- code is written in a hurry; bad/old code is not always thrown away.
And, yes, technically, treating and mixing data as code is harder to do by default.
But, there's a more positive view (lets ignore the 99% of SQLi vulnerabilities that the answers above explain). SQLi still exists on extremely well-designed and carefully developed websites because we are awesome. Hackers rule. You only need to look at the hundreds of attack vectors and thousands of SQLi payloads that have been developed over the last seventeen years to regain some faith in the human race. Every year brings with it new techniques presented DEFCON/BlackHat/RSA/IEEESSP. Bug bounty programs for Facebook, Google and the like have all had to shell out at least once for a critical SQLi.
Partly, it's because of the complexity and number of layers in our system, each mutating data in newer and more interesting ways. We increasingly need more done, faster, using fewer resources. And as long as we cannot feasibly test all paths into the system, no one is going to certify a solution to injection problems.
Because such security issues are not covered during most 3-year education cycles and equivalent studies, and many developers followed such track (including myself). Given how wide the field is, actually 3 years is not even enough to cope with the actual study program.. So things like security are dropped.
It is unfortunate, but since some of the new developers won't ever try to learn new things by themselves, those people will always write SQLi-prone code until a more educated colleague points the issue out (or until an actual SQLi happens).
During my studies (many years ago), our teachers always told us to use PreparedStatements when creating manual SQL queries, because it is "best-practice", but they did not say why. This is better than nothing, but pretty sad, still. I'm not sure if those teachers even knew themselves.
We learned how to display stuff on a jsp, but not what Cross-Site-Scripting is..
I am "lucky" to be a passionate dev with some time in my hands, so I learned all these things by myself a long time ago, but I'm sure that many developers just do their 8-hours-a-day (for legitimate reasons by the way), and as long as nobody shows them what is wrong, it won't change.
If you use prepared statements correctly, SQL injection is not possible.
"If the original statement template is not derived from external input, SQL injection cannot occur."
https://en.m.wikipedia.org/wiki/Prepared_statement
Unfortunately people usually don't use prepared statements correctly, if at all.
SQL injection would be a thing of the past if they did.
And yes, php/MySQL have had a prepared statement implementation for a very long time, over 10 years if memory serves...
The other answers have pointed to almost all the reasons. But there is something else, which I think is the most dangerous security concern of all. Developers attempt to add more and more features to technologies, and sometimes deviate from the actual purpose of the technology. A little like how a client side scripting language ended up being used for server side coding or was allowed access to remote resources as well as client side local storage. Instead of layering these as separate technologies, they were all put into one big honeypot. Looking into some of the advanced SQL injections, we can see how they have played a part in the steady accent of SQLi attacks.
With SQL though, I guess the biggest mistake was the coupling of commands and parameters. It's a little like calling run(value, printf)
instead of printf(value)
.
Oh and one last thing, while it's quite easy to convert between different type of databases, the changes required in the server side code are mammoth.
Someone should abstract between different type of databases and make it easier to switch between different dbs. Say a php plugin which takes in as input QL commands and the type of database, and may be a whitelisted filter to sanitize the input.
Personally I think this is a specific case of a more general problem in programming, that IDE's and languages are overly permissive. We give our developers immense power in the name of flexibility and efficiency. The result is "what can happen will happen", and security lapses are inevitable.
PDO (or other "safe" methods) is no more secure than mysql_ (or other "unsafe" methods). It makes it easier to write safe code, but it is even simpler to just concatenate the unescaped user provided strings into the query and not bother with the parameters.