How can an online hack be traced back to the perpetrators?

Question

What sort of evidence might be used in linking, say, the Sony hack to North Korea? While I am curious about what was used in this particular case, my question is meant to be a more general question about what sort of things might be act as a tipoff in general situations like this.

Things I speculate:

Using the user agent string from a web request as a sort of fingerprint. This seems like it has a relatively small amount of information and would be fairly spoofable.
Seeing the language of compilation or comments (the most publicized bit of evidence in the Sony hack). This again gives very little information and is easy to fake.
Tracing an IP. Even if this is done successfully, it seems like a competent hacker would be using some form of redirection to hide themselves. (I found one site claiming that the Sony hack had an IP traceable to SE Asia, but that's still pretty broad and circumstantial)
Finding a distinctive sequence of commands in some malware that links it to some previous malware (e.g. how antivirus often works). However, it's turtles all the way done, and I'm unclear how the original malware could have been linked to North Korea (or wherever else).

I see this very related question, but the only concrete tidbit from those answers is that if you can see the available wireless networks on a hacker's computer, you could use that information to localize. Getting that information seems fairly unlikely though.

I also see these tangentially related questions that essentially say "yup, tracking people is hard".

There's this question about investigating hacks, but its answer only deals with the series of steps in recovering from the hack.

That all said, I've only seen one reasonably credible news source questioning that the FBI and the US government settled on North Korea. Thus, I assume I must be missing some reasonable ways that the hack could be traced.

I totally understand that the public probably doesn't have all the details on the hack (and possibly never will). I'm looking instead for what sorts of things would serve as convincing evidence and might reasonably have been found.

The attackers are *suspected* to be from North Korea because their motive is to take down a movie insulting their leader; and that government's response was along the lines of "we didn't do it but we consider it as a right thing", rather suspicious. — , Dec 21 '14 at 19:07
Sure, but when Obama has directly promised that the US “will respond proportionally” to North Korea, you have to hope they have more than circumstantial reason to suspect North Korea. Also, read my question: I'm not assuming that the Sony hack is by North Korea. I'm also not strictly even asking about the Sony hack. I'm wondering how a hack like this could feasibly be traced at all. — Cannoliopsida, Dec 21 '14 at 19:13

stochastic · Accepted Answer · 2014-12-21T20:47:41.077

Your points are good ones, and this is the interesting thing about tracing hacks: it is easy to cast suspicion, but is very difficult to conclusively prove where an attack came from.

Some things that narrow the field:

The language in which a program was written can often be determined, as you suggest. This is of course suggestive, but not definitive.
As you also suggest, motive is a consideration. Perhaps a bigger consideration is resources. As usual, the physical resource constraints of the real world do a lot to narrow down possibilities. This is why "following the money" is such a valuable tool in investigations. It takes a lot of time and effort to do a really good job of hacking a network. In this regard, it's fairly easy to distinguish between a well-funded and well organized group and script monkey messing around. There are only so many people with motive AND resources. Again, not definitive, but helpful.

There are several ways in which hackers might do something stupid

leaving their real name in a message/log/etc...
using their real login credentials to some known service
leaving metadata in a request that they send that can be used to identify them. You allude to this when you talk about the user agent string. In fact, there are many more things that a server can use to fingerprint a user. This all presumes a web browser, though, and there are many more ways to carry out an attack.
the metadata-in-the-request problem is only really a problem if (1) the metadata gets logged, and (2) the hacker isn't careful enough to erase the log.

Barring the hackers doing something stupid, I would argue that tracing an IP down to it's source is the only real way to de-anonymize the attackers: other things help but are circumstantial. Some things that make this hard:

clever and well equipped hackers will indeed use multiple layers of VPN software to protect themselves. Often, when possible, VPN accounts are purchased in ways that are as anonymous as possible, making it difficult and time consuming (but by no means impossible) to trace back to the original IP address.
things like server logs will often preserve the ip address of the final hop in this VPN chain, at least giving you a starting place, but these are often doctored by clever, careful, and well-equipped hackers, making it difficult even to get started.
If you don't get to watch the attackers as they are actually working, tracing back by asking the VPN service to cough up the originating IP address may not be feasible, as they simply may not have the logs, or may have deleted them.

mitigating factors to the difficulty of tracing IP addresses:

there is no way to spoof a TCP connection, as setting one up requires a three way handshake: client sends a SYN packet, server sends a SYN ACK packet, and client sends another SYN ACK packet. The client must thus provide a real IP address to the server (one that actually lets packets get back to the client). This means that you can always at least get a start if you watch the attack happening.
if you can get timestamped logs of the right traffic data, there are correlation attacks that you can use to de-anonymize someone behind a vpn redirection service, or even multiple such redirection services. The tor project people have details about many of these attacks, and have had recent issues along these lines. Think of the packets coming from the attacker and the corresponding packets arriving at the attacked network (through multiple layers of vpn). The time delay through the interleaving networks will be mostly the same for those packets, and some statistical analysis will then let you match them up with high confidence (see here, for instance). Clearly, the difficultly here is obtaining the requisite timestamped packet logs. It is also possible for one of the interleaving networks to wait a random amount of time before forwarding packets, which would break this correlation attack. It is my understanding that the wait time would have to be pretty long and would significantly degrade the usabililty of the newtork, which is why the tor network explicitly doesn't do this.

How can an online hack be traced back to the perpetrators?

1 Answers1