24

If I use a Tor router to browse the regular internet, my traffic must leave the Tor network through an exit node. Apparently the exit node can see the data originally sent.

  1. Is this true?
  2. If an adversary wanted to deanonymize me, wouldn't they just have to subpoena the exit node owner or hack it?
  3. Does this mean a proxy is about as safe, since the above applies to both?
Anko
  • 189
  • 10
TestinginProd
  • 908
  • 3
  • 9
  • 14
  • 4
    It is quite likely that government agencies own many nodes making it therefore possible to trace you (if you use all or many of those controlled nodes in your path). But the exit node alone won't be enough (without additional data) as pointed out in the answers. – Omar Kohl Feb 27 '13 at 11:11
  • Proxy Tor is just a tool for using Tor. – Macil Nov 30 '17 at 01:39

6 Answers6

40

In Tor, the user (you) chooses a random path through several nodes for its data. The first node in the path knows your IP address, but not what you send or where. The last node ("exit node") knows the target server address and sees the data (unless SSL is used, of course), but not your IP address. Every node in the path knows only the addresses of the previous and the next nodes in the path.

If a government is intent on unraveling the privacy of Tor, then its best chance is to setup and operate a lot of nodes (which, of course, will not say "provided by your friendly government"). If your computer randomly chooses a path which begins by a government-controlled node and ends with another government-controlled node, then both nodes can correlate their data pretty easily and reveal both your IP and the target server (and sent data, if no SSL). Correlation is simple because while encryption hides the contents of data, it does not hide the length. If node A sees a 4138-byte request entering the Tor network from your IP, and node B sees a 4138-byte request within the next second exiting the Tor network and destined to server www.example.com, then node A and node B, by collating their data, will infer that your IP was involved with a communication to www.example.com.

It can easily be proven that if the hostile party does not eavesdrop on or hijack both the entry and exit nodes, then your privacy is maintained. But if they do, then privacy evaporates like a morning mist under the midday Sun.

Thomas Pornin
  • 320,799
  • 57
  • 780
  • 949
  • 1
    Tor provides sybil resistance by giving you a long-living guard (the first node). Unlike the other nodes which change around once every 10 minute, the guard can last for a year or more. The purpose is that an attacker who runs a lot of nodes gets a chance to own both the guard and exit _once a year_ instead of _once every ten minutes_. That way, even if every server in the world suddenly turned into a Tor node (giving nearly 100% malicious relays), you would still be safe for a year. – forest Nov 29 '17 at 05:23
  • Might be worth comparing this with a proxy, where a government need only control one node to compromise your privacy; the proxy itself. – Ajedi32 Dec 07 '17 at 15:42
  • Additionally, I would say their best chance is not to run a lot of nodes (as that is easy to notice), but rather to compromise many large DCs and networks (which they already try to do) in order to monitor honest nodes. – forest Dec 30 '17 at 11:33
9
  1. Nope, the exit node can only decrypt the message and make the request, but he is not aware of where the original host is located, the only node that knows where the person is located is the second node. This is due to the layered encryption Tor uses. Every node only knows the next and previous hop, but not the whole path.
  2. Nope because of 1
  3. Nope because of 1

How Tor works is described in one of the blog posts on this very website:

http://security.blogoverflow.com/2012/04/tor-exploiting-the-weakest-link/

Lucas Kauffman
  • 54,169
  • 17
  • 112
  • 196
  • 4
    You should not forget that many users proxy unencrypted connections (e.g. SMTP and POP3) through the TOR network. Even if the exit node does not know the source IP of the connection, the traffic content often contains hints to reliably identify the source, e.g. usernames and passwords. – jarnbjo Feb 27 '13 at 14:03
  • What do you think I explain in that blogpost ;) – Lucas Kauffman Feb 27 '13 at 14:06
  • 8
    That sort of attack has gone beyond just the proof of concept level. Back in 07 a security researcher set up 5 exit nodes and intercepted data from multiple embassies and fortune 500 companies: http://www.securityfocus.com/news/11486 – Dan Is Fiddling By Firelight Feb 27 '13 at 14:41
5

I know this was already answered, but a lot of important details were left out.

How onion routing works

Onion routing is an anonymity technique where a path is chosen randomly through a cluster of servers, such that each connection goes a different route. The specific relays for the guard, middle, and exit are chosen randomly by the Tor client. The path from guard to exit is called the circuit, and the Tor client remembers this. The guard is chosen once and remains the same for a long time (as explained below), while the middle and exit change at periodic intervals (either once every ten minutes, or when a new connection is established). The unpredictable path and large number of relays to choose from improves anonymity significantly.

How Tor works
(source: torproject.org)

When you send data over Tor, the data is encrypted with three keys. Each layer specifies the subsequent relay to be used (as chosen by your client at random):

  • An application, like Tor Browser, requests a webpage through Tor, and tells that to the client. This request is done on your local network using the SOCKS5 protocol.
  • The Tor client encrypts data with three keys, and shares each key with a different, random relay. Encrypted in each layer is also the address of the next relay. This is then sent to the guard.
  • The Guard receives data and strips the third layer, using its key. It forwards the data to the relay specified in the third layer, the middle relay.
  • The middle relay receives data and strips the second layer, using its key. It forwards the data to the relay specified in the second layer, the exit.
  • The exit receives data and strips the final (first) layer, using its key. It checks the destination site and forwards the now fully decrypted data to it.
  • The destination site receives the data and sends a response to the originating IP, the exit.

Now the traffic has been successfully sent to its destination, but it has to get back. Tor relays keep which relay is communicating with it in memory, so when it gets a response from that relay, it knows where to send it. This way, the middle relay knows the guard asked it to send data to the exit, and it remembers this so when that same exit gives it data back, it can forward it to the guard:

  • The exit receives the response, adds the destination of the previous relay (the middle), encrypts it with its key, and sends it off to the middle relay.
  • The middle relay receives this, adds the destination of the previous relay (the guard), and adds another layer of encryption using its key before sending it to the guard.
  • The guard receives this and adds a third layer of encryption with its key before giving the data to you, the Tor client.
  • The Tor client receives this and strips all layers of encryption. It then gives the response to whatever application requested it (such as Tor Browser).

This is the original concept behind onion routing. All this happens in a second or two.

onion routing diagram

Who can see what?

I actually wrote this answer because no one linked to the obligatory EFF diagram on Tor. This depicts each point of interest and what a given adversary can observe:

Tor and HTTPS

From the perspective of the relays, three things are true:

  • The guard knows who you are (your IP), but not what you are doing (your destination).
  • The exit knows what you are doing, but not who you are.
  • The middle node knows nothing about you.

The anonymity stems from the fact that no one entity can know who you are and what you are doing.

Traffic analysis attacks against Tor

In order to be deanonymized using Tor, assuming no attacks against you directly (software exploits, backdoored hardware, OPSEC failures), an entity that knows who you are, and an entity that knows what you are doing must collude. In the diagram, that adversary is labeled as the NSA. The black dotted line shows data sharing, which means that the precise timing information can be used to correlate you. This is called a traffic analysis attack, and is a risk when your adversary monitors both ends of the connection. Tor has only a limited ability to protect against that, but thankfully it is often enough, since there is so much traffic to blend in with. Consider the following timeline of events:

  • ISP1 sees 203.0.113.42 send 512 encrypted bytes (253 unencrypted) of data at t+0.
  • ISP2 sees example.com receive a 253 byte request for /foo.html at t+4.
  • ISP2 sees example.com send a 90146 byte reply at t+5.
  • ISP1 sees 203.0.113.42 receive a 90424 encrypted byte reply (90146 unencrypted) at t+9.

ISP1 is any ISP between you and the guard, and ISP2 is any ISP between the exit and the destination. If all this can be monitored, and ISP1 and ISP2 collude, then with sufficient computation, one can conclude that IP address 203.0.113.42 accessed example.com/foo.html. Tor makes this harder in a few ways. First, persistent guards reduce the chance that an adversary will be able to observe steps 1 and 4 by adding a large number of malicious guards to the network. Second, Tor sends traffic in cells of 512 bytes each (or at least used to be. It's 514 bytes now), so step one would involve sending 512 bytes, yet step 2 would still show 253 bytes received. Third, the number of hops Tor goes through increases jitter in latency. Because of this, each subsequent timestamp will differ by a small but random time. This makes it hard to distinguish other connections which transfer a similar amount of data at a similar time from your connection.

There have been many academic attacks against Tor which rely on traffic analysis, but they always assume a small world where latencies are all fixed and deterministic. These are the attacks that tend to be reported on in the media, despite not applying to the actual Tor network in a world where every network is full of noise.

Traffic analysis attacks against a proxy

This kind of attack is difficult to pull off against Tor because an adversary may not have access to both ISP1 and ISP2. Even if they do, the infrastructure of one of them may not be sufficient to record high-resolution timestamps (for example, due to reduced granularity NetFlow records), and their internal clocks may differ slightly. With a proxy, however, this attack is far easier to pull off. This is an issue even if you completely trust the proxy provider. Consider this alternative timeline of events, where ISP1 represents the ISP of the proxy service itself:

  • ISP1 sees 203.0.113.42 send 253 bytes of data at t+0.
  • ISP1 sees the proxy server send a 253 byte request to example.com for /foo.html at t+1.
  • ISP1 sees example.com send a 90146 byte reply at t+2.
  • ISP1 sees 203.0.113.42 receive a 90146 byte reply at t+3.

With this information all in the hands of ISP1, it becomes quite easy to conclude that 203.0.113.42 requested example.com/foo.html. There is no padding, and virtually no jitter (since the delay is only as long as it takes the proxy service to forward the request internally). Because of this, this single ISP knows both who you are and what you are doing, and all it has to do is connect the fact that they come from the same person. Simple. This is the main technical downside to proxies, even when their often sketchy nature and history of poor honesty is ignored.

forest
  • 64,616
  • 20
  • 206
  • 257
5

I would like to point you to this answer here on that describes in simple terms how Tor works.

The exit node has no idea where the data originates from, this is the entire point of Tor. The key to Tor is the multiple layers of encryption the data goes through as it travels through the Tor network.

Using a proxy on the other hand, involves a direct connection between your machine and the proxy server. Compromising a proxy can pretty quickly reveal who is connected to it.

1

adding to the other answers

  • Use of a proxy helps you hide your IP address and provides a certain level of anonymity but it does not make you untraceable. you can be tracked by using the logs generated by the proxy.
  • A better approach would be to use a proxy switching tool that keeps on changing the proxy server through which your traffic is getting routed after fixed time intervals. one such tool is Proxy switcher
  • This way you keep switching from one proxy to another around the world. Most of these proxies are up for a very small interval of time hence tracing becomes very difficult.
  • There are websites like anonymizer.com and vtunnel.com which let you access blocked websites (blocked by your administrator or Chinese government ;) ) through a proxy.
Shurmajee
  • 7,285
  • 5
  • 27
  • 59
-2

Adding to other answers as well:

if a government agency or a hacker wanted to figure out who I was, wouldn't they just have to subpoena the exit node owner or hack it?

A government agency would probably find you, no kidding. In the end the traffic going out of the node can betray you. End service prodivers (Gmail, Facebook) cooperate with them and will give out accounts information based on the IP address. Simple.

Implying that if you don't follow best practices when using ToR, it may be useless. Also implying that if I have your gmail account details, I know who you are, and if the same source IP does something I suspect illegal, I have but a few people to investigate (those sharing the exit node's IP).

Aki
  • 762
  • 4
  • 14
  • 2
    This is simply not true, service providers will have no access to your real IP, so they can't give it out. Of course, you can still be stupid enough to share your real name and location to those service providers yourself, or you can login with your real IP, in that case they can forward it if they want. – Dorus Feb 27 '13 at 13:53
  • 1
    [It is Tor](https://www.torproject.org/docs/faq.html.en#WhyCalledTor), not TOR, and definitely not ToR. – forest Nov 29 '17 at 18:30