3

I have been searching for an answer as to how you should treat false positives in Fortify scans.

For a long time, if something was determined to be a false positive, I would document the reasoning behind why that issue was a false positive and suppress the issue.

One of my colleagues interviewed a former Fortify employee and was told that you should never suppress issues as it could prevent particular new findings from being displayed.

Now, I have personally run scans with suppressed issues and have found that is not the case. Even with suppressed issues, new findings under the same category are being found.

I found this post and one of the contributors points out that if vulnerabilities are being relayed outside of the security team to suppress them.

What is the best practice here? Is the way I have been doing it the right way?

schroeder
  • 123,438
  • 55
  • 284
  • 319
developer_117
  • 141
  • 1
  • 5

3 Answers3

0

In general you should not suppress the issues unless you are really really sure, in the case of Fortify I dont consider a good tool for c/c++ projects for example, but I know that works for other languages quite well. You should make an analysis of the issue and decide if you want to suppress or not with a minimum of two persons for evaluate the risk of fixing or not the issue.

Another approach will be to use another tool and check if both of them agree on the same code line, that is another indicator.

camp0
  • 2,172
  • 1
  • 10
  • 10
  • Thanks, @camp0 Yes, we typically have two people validate if the finding is false positive before suppressing. We are scanning c# code base so for the most part, it does ok. But you wouldn't see a problem in suppressing issues if there were two people validating and documentation of that finding? – developer_117 Jun 25 '19 at 08:38
  • My suggestion will be to use another tool and see if the other tool complain on the same part of the code, if two tools gives you the same warning probably you have something there that needs to be analyze and fixed. The criteria for suppress depends on the base code, occurrences, times executed and many others. – camp0 Jun 25 '19 at 08:43
  • Our findings are also monitored by our management team so it is not just our development team that is seeing the results. I can see if it was just a security team that was dealing with the results but many different eyes are on the findings. – developer_117 Jun 25 '19 at 08:43
  • 1
    The develop team should evaluate that risk from my point of view. Probably your management team is just checking the alerts of the Fortify and have no idea what is behind the curtains. – camp0 Jun 25 '19 at 08:45
  • Currently, the code base has the Fortify SCA scan, Burp Suite scan and then Web Inspect. For the most part, the combination of Fortify and Burp seem to capture all findings and typically Web Inspect finds random finds that are also typically false positives but all unrelated. – developer_117 Jun 25 '19 at 08:49
  • Agreed about the "no idea what is behind the curtains" but nonetheless they monitor the reports. From their standpoint, I know they are more concerned about the metrics. Guess I am just trying to get them on the right page as to what they should be expecting because they just want to see 0. – developer_117 Jun 25 '19 at 08:55
0

The best practice when "triaging/auditing" findings is to tag them. One of the default tags is "not an issue".

schroeder
  • 123,438
  • 55
  • 284
  • 319
0

Most appsec missions are graded on fixing app vulns, not finding them.

If Fortify SCA can be put into a pipeline, it can also be hooked to fix issues automatically (although care must be taken to avoid situations like the Debian OpenSSL PRNG vulnerability, which was not a vulnerability until a security-focused static code analyzer suggested a fix that ended up being the vulnerability).

Once you are fixing issues automatically (not all issues will be like this, so focus on certain always-true positives with standardized remediation that can be code generated through high-fidelity qualities), then you can turn your attention towards trivial true positives. For trivial true positives, these are ones that just never need to be fixed. They are real issues, but you just don't care because you don't have time or energy to fix them.

After you deal with trivial true positives, focus on false negatives. These are more elusive than false positives and will help you understand your false positive problem better. To avoid false negatives, tune the rules so that the named sources, passthrus, and sinks fit your app portfolio, and vice-versa. This could require renaming functions, variables, methods, classes, and the like -- or it could be structuring JSON and/or XML rules files that link up the right sources to the right sinks, and vice-versa. It could also mean elimination of code indirection properties, such as unnecessary Dependency Injection or similar patterns that affect leaving app portfolios as single, in-place architectures. Did you discover new findings that can be automated by eliminating false negatives? Good, then automate their fixes as well. Even better, you can get app developers to write new unit tests (or component or system tests, depending on the layer) that assert the behavior of each defect's fix -- and this could happen way before the code is scanned by Fortify SCA.

Finally you are left with false positives, which can also be tuned (and probably most already were automatically tuned out by concentrating on tuning in false negatives to true positives).

One of the best ways to determine if a finding category or subtype needs to be manually escalated to an app developer (or an appsec analyst, vendor, etc) is to lever a supervised machine learning algorithm such as BlazingText. You'll need to train-test split your data, and sometimes there are existing data sets available for this purpose -- other times it's something you have to build yourself. You may even want to use unsupervised machine learning for that early process, such as clustering with a variety of text mining techniques (i.e., you'd be text mining on the words, their structure, and their relation to OWASP and your own Application Security standards, penetration tests, and other data), but TF-IDF, LDA, and HMM come to mind (although there are many iterations and play on these). BlazingText is more of a word2vec algorithm, so ultimately you are looking for it to be used to determine a path towards automatic fixing, automatic escalation, et al. GPUs or moving the algorithms to RNNs could possibly provide improvements, but perhaps at a significant cost (not in time, but in GPU usage and power). Supervised BlazingText is sort of the sweet spot at the moment, but understanding and evaluating your model(s) is a part of any machine learning process.

atdre
  • 18,885
  • 6
  • 58
  • 107