10

Over the years I've heard various estimates for the average number of exploitable bugs per thousand lines of code, a common figure being one exploitable bug per thousand lines of code. A Google search gives some much lower figures like 0.020 and 0.048 on one hand but also very high figures like 5 to 50. All these numbers are for code that hasn;t been reviewed nor tested for security.

Have any serious empirical studies been done on this subject? Such a study could be done based on well reviewed open source software by checking how many security holes have been reported over the years. If not, where do these numbers come from?

Andrei Botalov
  • 5,267
  • 10
  • 45
  • 73
David Wachtfogel
  • 5,512
  • 21
  • 35
  • There is a DARPA study looking at IPS systems and similar. Look at Mudge's 2011 DEF CON talk. – ewanm89 Oct 04 '12 at 22:52
  • Defcon or Blackhat? I was at Mudge's Blackhat 2011 keynote in which he gave the apocryphal "one exploitable bug per thousand lines of code" but I don't recall him backing this statement with substantial research. – David Wachtfogel Oct 05 '12 at 05:04
  • I think he gave the same speech at both though it wasn't part of the keynote at DEF CON anyway the archives for both are public. And as for the research, one should go and checking the slides and the notes from the archives. – ewanm89 Oct 05 '12 at 10:17
  • This largely uncited wikipedia article seems thing there is but the lack of references is a little concerning: http://en.wikipedia.org/wiki/Source_lines_of_code#Relation_with_security_faults – Andy Smith Oct 05 '12 at 11:55
  • In truth, if you found an answer for this global statistic, it would really be meaningless, absent any context. New programmers, or senior? Big companies, or small? Focus on secure coding? Technology and languages? SDL frameworks? and more. What would make more sense, and easier to do (after some prep work), is calculate this number *for your own organization*. – AviD Oct 12 '12 at 13:43

2 Answers2

9

Any number you get is going to be fairly meaningless -- some factors to consider:

  • Programming Language - Some languages let you do very unsafe things; e.g., C makes you directly allocate memory, do pointer arithmetic, has null terminated strings, so introduces many potential security flaws that safer (but slightly slower) languages like ruby/python do not allow. Purpose of application? What type of coder/code review?

  • Type of Application - if a non-malicious programmer writes a relatively complex angry bird type game in Java (not using unsafe module), there a very good chance there aren't any "exploitable" bugs -- especially after testing; with the possible exception of being able to crash the program. A web application in PHP written by amateurs, has a good chance of having various exploitable flaws (SQL injection, cross-site scripting, bad session management, weak hashing, remote file inclusion, etc.).

  • Programmer expertise at writing secure code. If you hire a high school student with no past experience to code up some web application, there's a reasonable chance they'll be major flaws.

Furthermore, counting the number of "exploitable" bugs is not a straightforward task either; if finding bugs was straightforward they'd be removed in code review. Sometimes many bugs only arise due to subtle race conditions or complex interactions among programs/libraries.

However, if you take open-source projects, its fairly easy to find a count of LoC at ohloh.net and a count of "exploitable" vulnerabilities at cvedetails.com (I arbitrarily defined 'exploitable' as CVSS over 7). I randomly decided to look at some web browsers, programming languages, and web frameworks and found:

Web Browsers:

open source programming languages:

Web Frameworks:

So again for these specific major programming projects (likely written by expert programmers) found rates of major exploitable vulnerabilities at a rate of 0.003 to 0.08 per 1000 LoC. (Or 1 per 12 500 - 300 000 LoC). I would necessarily extrapolate to non major open source projects.

dr jimbob
  • 38,768
  • 8
  • 92
  • 161
  • Great answer. Looking at these numbers and the numbers from the Discretix log and SecurityWeek article cited in the question, it seems that the rate is somewhere between two and eight exploitable bugs per 100K LoC. The fact that django and python have much less bugs reported may be an indication that they haven't been reviewed enough. – David Wachtfogel Oct 06 '12 at 17:00
  • @DavidWachtfogel That there are similarly many LoC for django as for Ruby on Rails suggests that they should have been reviewed similarly well, but I did not find robust non-FOSS usage statistics, so it’s hard to define the reliability of the numbers. Do you have usage statistics for django vs. rails? – Arne Babenhauserheide Mar 11 '15 at 10:36
2

As someone who security tests web apps for fun and profit the security defects per thousand lines is way higher in common open source web apps than the 0.08 figure quoted. Presumably the issue is CVEs record only security defects found and reported via the relevant channels, you need metrics where the code has undergone systematic reviews so that at least low hanging security defects have been detected, otherwise what you are measuring is some fraction of the testing effort.

user126215
  • 21
  • 1