11

DARPA announced a winner on August 4, 2016 of its Cyber Grand Challenge DARPA Cyber Grand Challenge. The contest was described as

designed to accelerate the development of advanced, autonomous systems that can detect, evaluate, and patch software vulnerabilities before adversaries have a chance to exploit them. The seven competing teams in today’s final event were composed of whitehat hackers, academics, and private sector cyber systems experts.

They described the actual challenge as:

For almost 10 hours, competitors played the classic cybersecurity exercise of Capture the Flag in a specially created computer testbed laden with an array of bugs hidden inside custom, never-before-analyzed software. The machines were challenged to find and patch within seconds—not the usual months—flawed code that was vulnerable to being hacked, and find their opponents’ weaknesses before the defending systems did.

The winning system, Mayhem, was to be formally invited to participate in the DEF CON Capture the Flag competion, "marking the first time a machine will be allowed to play in that historically all human tournament."

I read and re-read the material at DARPA and I still can't believe that an automated system found anything on the level of the April 2014 Heartbleed bug. I am wondering if the vulnerabilities were more on the level of published Microsoft security notices or recommended updates or the like (in other words, the "bug finder" more of an automatic patch installer basically).

Does anyone know what was the technical level of the DARPA challenge test bed in relation to Heartbleed or similar actual vulnerabilities? I guess I will find out when Mayhem actually competes against humans in the CTF competition, but I am doubtful at the moment.

Dalton Bentley
  • 321
  • 1
  • 7

1 Answers1

8

The DARPA test bed, CGC (Cyber Grand Challenge), was a modified Linux platfrom (DECREE). Its binaries (the compiled machined code for the system) contained only 7 system calls (e.g., terminate, receive, fdwait, allocate, etc.). Within the system code as a whole were DARPA crafted Challenge Binaries (CB) with one or more vulnerabilities to fuzzing, i.e., vulnerable to input of some character.

One of the early examples of fuzz testing was Steve Capps' "Monkey" to test for bugs in MacPaint by feeding random events to the code. "Monkey" alludes to "a thousand monkeys at a thousand typewriters will eventually type out the entire works of Shakespeare," i.e., if you send enough random or protocol crafted input to a program eventually you will find a way to crash it. Fuzz testing will typically not find security threats that do not cause program crashes, e.g., spyware, many viruses, worms, Trojans, and keyloggers. However, it will find exploits based on unexpected program input (or code errors related to program input), so may have been able to find the Heartbleed bug, which was a bug caused by improper input validation in the implementation of the TLS (Transport Layer Security) protocol (in OpenSSL).

The winning team, Mayhem, used a symbolic executor (Mayhem) in conjunction with a directed fuzzer (Murphy). A symbolic executor is familiar to those of you having worked with symbolic debuggers in integrated development environments, i.e., it is basically an interpretor that steps through the code and allows the assignment and tracking of symbolic values for inputs at various portions of the executing code. In other words, the Mayhem team was able to statically analyze portions of the CBs (Challenge Binaries) with Mayhem to detect (or analyze Murphy's suggestions) code vulnerabilities. This was a requirement of the DARPA challenge, i.e., to find a vulnerability and demonstrate it, i.e., POV (Proof of Vulnerability). Unfortunately, symbolic execution doesn't scale very well since the number of paths in a program grows exponentially and can go infinite with unbounded loop iterations, so heuristics or path-finding (enter Murphy fuzzer) to partially solve this path explosion problem (computing methods like parallelizing help also).

The DARPA CGC (the challenge) required the teams to not only locate vulnerabilities in the CBs (challenge binaries), but to document those with a POV (proof of vulnerability) and repair the vulnerability with a Repair Binary patch (RB). That patch would be tested for performance, both correct operation without the vulnerability as well as literal performance impact on the relevant code (execution time). This is all fairly amazing considering that a human was not allowed to contribute to any of what I just described.

You can read more at the blog of the winning Mayhem team here Unleashing Mayhem!, obtain DARPAs Grand Challenge rules and other documents here Darpa Grand Challenge related documents, read about fuzzing at Fuzz Testing and symbolic execution here Symbolic Execution.

Dalton Bentley
  • 321
  • 1
  • 7
  • Great write up! You just saved me an hour of reading :-) Welcome to Security Stack Exchange, I hope you'll have fun here. – paj28 Aug 08 '16 at 20:11
  • @paj28 I'm glad you believe I took the ball you tossed and ran with it to a good outcome. I enjoy this SE community and drop by frequently (though I'm slowing down a little generally). – Dalton Bentley Aug 08 '16 at 20:38