2

I was reading about how Tor works. There is says that if the attacker is able to see both ends of the communication channel then Tor fails (and other anonymous networks too).

How and why does this attack work?

Arminius
  • 43,922
  • 13
  • 140
  • 136
Ugnes
  • 361
  • 2
  • 3
  • 15
  • I think you should specify what exactly in that link you're confused about. As your question is now I don't quite understand it. – RoraΖ Jan 05 '17 at 19:40
  • @RoraΖ The question makes sense to me. OP is asking how correlation attacks work. – Arminius Jan 05 '17 at 19:50
  • But what about the explanation is unclear? The article says "the research community knows no practical low-latency design that can reliably stop the attacker from correlating volume and timing information on the two sides." Correlation attacks are exactly what they sound like, you see a packet enter one side, moments later a packet comes out the other side, you have a pretty good idea which packet it was going in. After a few thousand packets you approach 100% certainty. Anonymity, busted. Note that it only "breaks" ability to hide who was talking to what site, encryption can still be used. – Jeff Meden Jan 05 '17 at 20:05

2 Answers2

5

This is called an end-to-end confirmation attack.

The idea is simple: Instead of attempting to decrypt the content of packets, an attacker who manages to observe both ends of the communication channel tries to find patterns in the traffic to match outgoing and incoming data in order to deanonymize users. This can be done by correlating the volume of transmitted data or by comparing the times at which packets are transmitted. For example, a user streaming a video exposes another pattern in terms of timing and traffic volume than someone browsing a website.

Correlation attacks are a hard-to-solve problem in low-latency anonymity networks like Tor and the Tor Project explicitly stated in a blog post that they don't protect against these attacks by design:

The Tor design doesn't try to protect against an attacker who can see or measure both traffic going into the Tor network and also traffic coming out of the Tor network. [...]

The way we generally explain it is that Tor tries to protect against traffic analysis, where an attacker tries to learn whom to investigate, but Tor can't protect against traffic confirmation (also known as end-to-end correlation), where an attacker tries to confirm a hypothesis by monitoring the right locations in the network and then doing the math.

There's quite a lot of research suggesting that correlation attacks are still a major threat to Tor users. For example, this paper from 2013 analyzed realistic correlation attack scenarios, concluding:

The results show that Tor faces even greater risks from traffic correlation than previous studies suggested. An adversary that provides no more bandwidth than some volunteers do today can deanonymize any given user within three months of regular Tor use with over 50% probability and within six months with over 80% probability.

The thesis "Defending End-to-End Confirmation Attacks against the Tor Network" (2015) contains some more recent correlation experiments on the live Tor network with the insight that "end-to-end confirmation attacks can be successfully applied against the current size Tor network". The author also proposes a concrete defense technique based on dummy traffic that is supposedly "simple, easy to implement and deploy as well as usable" but to the best of my knowledge hasn't found its way into the Tor protocol, yet.

Arminius
  • 43,922
  • 13
  • 140
  • 136
4

This is explained in the second sentence on your link:

For example, suppose the attacker controls or watches the Tor relay you choose to enter the network, and also controls or watches the website you visit. In this case, the research community knows no practical low-latency design that can reliably stop the attacker from correlating volume and timing information on the two sides.

(formatting mine)

This means that when you visit a website in this scenario, website returns some content, which is transmitted to you over Tor network. The attacker can match the time and amount of data was sent with the timing and amount of data received on your end, such as:

11:30:11 Server sent 5kb
11:30:12 Your node received 6kb

11:33:17 Server sent 14kb
11:33:18 Your node received 15kb

Thus after collecting enough of this information, the attacker can decide with high probability whether you are using the site at this particular moment. And by controlling an entry node, he also knows your IP address, thus de-anonimyzation is complete.

George Y.
  • 3,504
  • 2
  • 10
  • 15
  • I know. I wanted someone to clarify more on that. Thanks for the answer and explanation. – Ugnes Jan 06 '17 at 04:39