14

I am writing a research paper on tracking hackers and how to include keystroke timings to create a profile of hackers.

I want to combine the keystroke timings that I capture in ttylog with other data from sessions, like IP address, type of attack, any 1337-speak used, or anything else I see that makes a person/session unique.

I need to get some real world data to work with. Do you know where I can get data of real world attacks?

I have Kippo running which is great for getting keystrokes and the timings related to each keystroke.

user13959
  • 141
  • 2
  • I'm not sure I understand what data you're seeking. Are you looking for keystroke logs _with timing_ of real attacks? Wouldn't that imply that you're relying on the hacker sitting at a machine and keyboard that you control? How can you capture keystroke timing without controlling the keyboard? – MCW Oct 12 '12 at 11:05
  • 1
    Kippo is probably good for this - which I see you are already using! It's probably difficult to get people running honeynets to give you all their data (though someone on here may prove me wrong!) but you may have a better shot asking them to run your analysis tool over their data and send you the (anonymised) results. – Andy Smith Oct 12 '12 at 11:05
  • You really probably mean attackers where you says hackers. A simple s/hacker/attacker/g should do the trick :) – adric Oct 12 '12 at 14:39

2 Answers2

10

Your idea of fingerprinting is very similar to wireless signals intelligence in WWII. Both sides used to have whole departments whose role was to learn the code style, or "fist" of the opposing side's wireless operators. By tracking these profiles and using radio direction finding they gained a surprising amount of information about troop and vessel movements, staff assignments, etc.

You're thinking of doing the same thing, learning the nuances of how particular crackers operate. Thinks like typing cadence, frequently repeated typing mistakes, etc could be used to learn a particular cracker's "fist". I think this is a good idea in some ways, but maybe not good enough to pursue:

  • Most attacks are scripted. Even when a top cracker is doing the hacking it's usually scripted before and after a successful exploit, so you'll have to wade through hundreds of attacks to find one fingerprinting opportunity
  • Data sources: you'll have a hard time gaining enough data to do any actual fingerprinting. The number of data sources you would need is far more than a simple research project, you'd need a dozen honeypots at least, and a very large database of information to work off of, with some complex modeling to interpret the data
  • Network latency and jitter are common on the internet, especially when traffic is coming from areas with poor internet connectivity. These areas will happen to be the source of many attacks, so your results could end up being skewed significantly. Is that pause followed by a flurry of typing the hacker's style, or simply network lag?
  • Verifiability of results: How can you prove your fingerprinting methods are in any way successful? How will you show that the patterns you find actually demonstrate a single attacker? They aren't going to come out and say, "yeah, that was me!"

My suggestion is to try this small-scale where you can control some of the factors. Get many volunteers (plus some scripts) to follow set scripts of commands in a terminal window and see if you can write algorithms that can reliably determine they typist. Then introduce packet latency and jitter to see if your algorithms can cope, and work on that. Once you have that working you could then go out to the internet and see if they work in the wild, otherwise you won't have any idea as to reliability.

GdD
  • 17,291
  • 2
  • 41
  • 63
  • +1 Great answer! Not only did you provide excellent historical background you suggested a simple experiment to validate his hypothesis. – adric Oct 12 '12 at 20:09
6

You could setup a honeypot, and add your additional logging systems. This will produce results, although probably not the results you are looking for. In the real world, most compromises are with bots, so keystrokes don't apply.

rook
  • 46,916
  • 10
  • 92
  • 181
  • 6
    Yeah, but once in a while it's not a script, and sometimes [the results are hilarious](http://www.youtube.com/watch?v=oJagxe-Gvpw) – Jeff Ferland Oct 12 '12 at 07:17