18

As with any tools purchase part of the outcome is in how good the evaluation criteria are, so it is important to understand the criteria people might use when assessing Security static analysis tools.

Obviously the weighting on each criterion would be down to the individual companies priorities but the list could be reasonably generic.

Some criteria which might apply are :-

  • Cost. A couple of components for this would be software licenses (up front and annual), hardware costs for running the software (assuming it's not SAAS)

  • Scaling. Ability for the tool to scale to larger environments. Specific points might be around sharing data between developers, integration with bug tracking and source code control systems...

  • Capability. Language coverage, false positive/negative rates.

  • Customization. A key area for this kind of software is ability to customize to the specific codebase that is being evaluated (eg, to take account of bespoke input validation libraries), also ability to customize reports and other outputs.

AviD
  • 72,138
  • 22
  • 136
  • 218
Rory McCune
  • 60,923
  • 14
  • 136
  • 217
  • I am aware this question is over 8 years old, but I can't stop myself from asking: **What is the actual question?** If this were posted today by a 1 Rep user, I would probably flag it as "Unclear what you are asking". –  May 06 '19 at 14:52
  • The clue is in your opening sentence :) The site was in early days in 2011 and the guidelines that exist now for "what does a good question look like" weren't in place. That said, I think the answer from @avid on the topic contains a lot of good information, so it serves it's purpose. – Rory McCune May 06 '19 at 16:07
  • I disagree, I point to this question often as an example of what "a good shopping question" would look like. IE I need to buy "some class of product", what are the criteria I need to know about to consider which is best for me? – AviD May 07 '19 at 09:17

4 Answers4

19

Here are the things on my list, that I use for my clients (including some of those that you've mentioned):

  • Scan model - how does the scan work, what forms of analysis, does it scan raw source code or compiled binaries?
  • Coverage (according to what the org requires today, and expects to use in the future)
    • Language
    • Architecture (e.g. some tools are great for web apps, but not so much for rich clients or even Windows Services / daemons)
    • Framework (e.g. support for ASP.NET but not MVC, or Java but not Spring)
    • Standard Libraries (e.g. recognizing Hibernate, log4j, or AntiXSS, to name a few problematic ones) as are needed
  • Scan performance. In this I include both:
    • Speed
    • Scalability to large codebases (including LoC, subsystems/projects, and external references)
  • Completeness - i.e. rules/scripts included in the scan, enough to provide confidence in the output, or in other words to minimize false negatives.
    • Note that this is both the list of provided rules (e.g. checks for "SQL Injection");
    • AND the quality of those checks, i.e. actually catch SQL Injections. (As an example, a certain leader in the field, which shall remain nameless, can be easily confused by many forms of codeflow, thus missing basic injection flaws.)
  • Accuracy of results. This applies to the results which ARE reported (as opposed to those missing, covered in "Completeness"). Poor accuracy can be seen in:
    • High numbers of false positives
    • Duplicates
    • Miscategorizations
    • Misprioritizations (e.g. labeling a very low impact bug as "High Risk")
    • Irrelevant results (i.e. a code flaw which is accurate, but completely irrelevant to the application or architecture; e.g. finding XSS in a non-HTML, non-HTTP, and non-interactive app, or SQL Injection on a client application).
  • Customizability - as you said, customizing sources, sinks, and filters; also reports; but also, no less so, customizing rules/scripts and adding custom logic (not just source->filter->sink). Note that many tools allow you to "customize" their rules, but this is really limited only to adding source/sink/filters. (See also "Generation" below.)
  • Sustainability / repeatability - by this I refer to handling of repeat scans. How does it handle changes? Issues previously marked as false positives? Do I get comparisons?
  • Deployment model, e.g. usually combination of:
    • Cloud based
    • single auditor station
    • shared server, accessed via remote
    • web access
    • developer plugin
    • build server pluggability
    • (and of course ability to set a different policy for each)
  • Usability. This includes:
    • UI (including hotkeys etc)
    • auditor filtering capabilities
    • providing enough context via highlighting enough of the codeflow
    • static text, such as explanations, descriptions, remediation, links to external sources, Ids in OWASP/CWE, etc.
    • additional user features - e.g. can I add in a manual result (not found by automatic scan)?
  • Reporting - both flexible per project as needed (dont forget detailed for devs and high-level for managers), and aggregated cross-project. Also comparisons, etc etc.
  • Security (in the case of a server-model) - protecting the data, permissions management, etc. (much like any server app...)
  • Cost. This includes at least:
    • hardware
    • license - any or all of the following:
      • per seat - auditor
      • per seat - developer
      • per seat - reports user
      • per server
      • per project
      • per lines of code
      • site license
    • maintenance
    • services - e.g. deployment, tuning, custom integration, etc
    • training (both the trainer, and the auditor/developers' time)
  • Integration (see also scan model above) with :
    • source control
    • bug tracker
    • development environment (IDE)
    • build server / CI / CD
    • automation
  • "Generation" - SAST tools can be informally categorized based on the following:
    • Generation 1 SAST: Scans the text of the source code for specific patterns, ala grep. Might also be based on context. (Many of the older tools would fit here, eg FindBugs and RATS. SemGrep is an excellent example of this.)
    • Generation 2 SAST: Compiles the code internally and traces various code paths, e.g. data flows and control flow. Typically based on source-filter-sink logic only. (Most commercial SAST tools are here).
    • Generation 3 SAST: Builds a full AST (Abstract Syntax Tree) and queries this tree dynamically, allowing discovery of complex logic beyond just flows. Some tools might allow you to customize these queries, or implement your own queries to enforce specific logic. (GitHub's CodeQL is one example, Checkmarx is another).
    • Generation 4 SAST: AI-based magical discovery of anything wrong (I'm not really sure how this actually work, so it's hard to relate without snark :-) ).

Looking over this, I think it's pretty much in the order of preference - starting from basic requirements, to applicability, to quality, to ease of deployment, to efficiency, to nice-to-have...

AviD
  • 72,138
  • 22
  • 136
  • 218
  • @Rory Although it should be noted that often it can be summed up in: "If it covers my language [and forget frameworks etc], just go for the cheapest license!" – AviD Sep 21 '11 at 13:33
  • 1
    @AviD heh indeed, I know that sensation. Having a list like this can hopefully help off-set some of that as it provides a clearer picture of the benefits of a good solution over a just ok one. – Rory McCune Sep 21 '11 at 14:38
6

Here is the most important thing to know about how to evaluate a static analysis tool:

Try it on your own code.

I'll repeat that again. Try it on your own code. You need to run a trial, where you use it to analyze some representative code of yours, and then you analyze its output.

The reason is that static analysis tools vary significantly in effectiveness, and their effectiveness depends upon what kind of code tends to get written in your company. Therefore, the tool that's best for your company may not be the same as what's best for another company down-the-road.

You can't go by a feature list. Just because a tool says it supports Java doesn't mean it will be any good at analyzing Java code -- or any good at analyzing your Java code and finding problems that you care about.

Most static analysis vendors will gladly help you set up a free trial so you can try their tool on your own code -- so take them up on their offer.

Gary McGraw and John Steven have written a good article on how to choose a security static analysis tool. In addition to hitting the point that you need to try the tools on your own code to see which is best, they also point out that you should take into account how well the tool can be customized for your environment and needs, and budget for this cost.

D.W.
  • 98,420
  • 30
  • 267
  • 572
2

A long list of criteria is as likely to distract you as help you come up with a good solution.

Take, for example, the issue of "false positives". It's an inherent problem with such tools. The long term solution is learning how to live with them. It means that your coders are going to have to learn to code around the static analysis tool, learn what causes it to trigger a false positive, and write code in a different way so that the false positive isn't triggered. Its a technique familiar with those who use lint, or those who try to compile their code warming free: you tweak the code until the false positive stops triggering.

The biggest criteria is understanding the problem you are trying to solve. There is an enormous benefit to making your programmers go through the step of running a static analyzer once, to remove the biggest problems in your code, and frankly learning what they should already know about programming and not make those mistakes. But the marginal value of continuously running static analyzers is much less, and the marginal cost is much higher.

Robert David Graham
  • 3,883
  • 1
  • 15
  • 14
  • I disagree strongly with your solution for false positives. This should not be a foregone conclusion, and is not inherent in all tools - at least, not as much in the *good* tools. Coding around the tool is a very bad idea, and indeed will greatly devalue using the tool again. – AviD Oct 04 '11 at 22:58
  • I've used a wide range of "good" tools on a wide range of code. Just like "compiler warnings" and "lint", it's inherent. – Robert David Graham Oct 09 '11 at 10:02
  • ...and it would depend on how you define "false positive". Static analyzers find a lot of dangerous coding practices that aren't necessarily a problem now, but which become one later. I.e. if some length check is removed at a distance place in the code, you might now have a real buffer overflow in this part. – Robert David Graham Oct 09 '11 at 10:05
  • 1
    but what I was saying was that if it is inherent in the tool, *by defintion* it's not a GOOD tool. On the other hand your point of how to define FP is an excellent one - this could be dependant on external factors, business context, etc - and that should be supported by the tool to create and manage "exceptions". Though I think your example was not a good one, since in that situation you dont *want* the programmers to "code around the tool" - you want them to fix it, albeit with a lower priority (since it really isnt exploitable, but it IS bad practice - which may *become* exploitable – AviD Oct 09 '11 at 10:26
1

I've been a big fan of Gimpel's PC-Lint over the years for C++ code. The biggest two factors for me were language coverage (nobody else had it at the time, really), and "livability". Living with static analysis is kind of a subjective thing, as you know. Gimpel has a chapter in their manual called "Living with Lint" that does a good job talking through the various ups and downs. Making it livable involves having the ability to customize the warnings and errors that are emitted, but in a way that doesn't drive developers crazy working with it.

Related to scalability, I have had trouble with analysis tools being unable to cope with third-party code - libraries, etc. - so on a big project, that's certainly a consideration as well.

Steve Dispensa
  • 3,441
  • 16
  • 20