0

I want to do a project on onion routing for my school graduation test.

I want to show how onion routing could be compromised if the end and start relay within the network are both accessible by a third party. Of course, this implies that someone has access to both relays, but I only need to show this principle for demonstration purposes.

How can you see which relay the client is using, and which relay the server is using, and how could you extract information from these relays?

schroeder
  • 123,438
  • 55
  • 284
  • 319
  • Do you mean "how can a ***third party*** see which relay the client is using?" – schroeder Nov 21 '21 at 14:54
  • Look at the image at https://images.theconversation.com/files/158813/original/image-20170228-13104-ylxylj.png and think about what would happen if the entry guard and the exit relay were compromised by the middle relay. – mti2935 Nov 21 '21 at 15:34
  • 1
    No, for demonstration purposes I would like to provide the information of the client relay to a third party so I can compare it to the output of the server relay and get the IP address . – mantissa Nov 21 '21 at 19:11
  • Think about what would happen if the entry guard and the exit relay were compromised by the middle relay. At that point, the attacker has control of all three nodes (the entry node, the middle node, and the exit node). So, the attacker is in a position to see the client's IP address, and the IP address of the server that they are connecting to. This basically strips the user of the anonymity that they hoped to achieve using TOR. – mti2935 Nov 21 '21 at 20:06
  • 1
    But this is an unrealistic scenario. I've read that it is sufficient if you conrtoll the Exit node and the entry node , you can resolve the encryption of the IP address . But thank you, if nothing else helps I will use this . – mantissa Nov 21 '21 at 20:35
  • Even if the attacker does not control the middle node, the attack can still succeed. In that case: At the entry node, the attacker sees the IP address of the client, and sees the IP address of the middle node that the TOR connection is routed through. Then, at the exit node, the attacker looks for a connection from the middle node, and sees the IP address of the server that the client is connecting to. At that point, the attacker knows the IP address of the client, and the IP address of the server that the client is connecting to through TOR. – mti2935 Nov 22 '21 at 13:36
  • 1
    How would I access this information that is going through the nodes ? – mantissa Nov 22 '21 at 14:03
  • Presumably, the client is connecting the server using a secure protocol, e.g. https. If that's the case, the client's requests and the server's responses cannot be intercepted unless the exit node mounts a MITM or similar type of attack. See https://security.stackexchange.com/questions/147189/tor-exit-node-security-https-website and https://security.stackexchange.com/questions/79438/can-a-malicious-tor-exit-node-perform-a-https-man-in-the-middle-attack-to-see-mo for some interesting reading on this subject. Also, Meta: use atsign (@) to notify someone when you post. – mti2935 Nov 22 '21 at 16:29
  • 1
    @mti2935 Thank you for your solution. I will try this – mantissa Nov 22 '21 at 17:43

1 Answers1

2

You can use the tor control protocol to talk to the Tor client process and access information about the current connection. There are libraries for it for most languages.

On a practical note, though, what you're proposing to do - purposefully send the node list from the client to a 3rd party - isn't a realistic demonstration of an attack. You're intentionally breaking the security model in a way that doesn't mirror any real-world attack scenario. Only the client knows the identities of the nodes in the chain, and no other node in the chain has sufficient information to identify both the client and server in a particular chain - that's the whole point of Tor's onion routing model. All you're showing by giving away the node list is that the anonymity of the system fails if you give away the secret information that underpins the system's anonymity. It's a tautology, not an attack.

The identity of each node in the chain is encrypted in such a way that only the previous node can read it. The process for talking to a clearnet service works like this:

  1. The client picks a set of nodes to build a chain with.
  2. The client takes identity of the target server and the message it wishes to send, and encrypts both using the public key of the exit node.
  3. The client takes that encrypted message, attaches the identity of the exit node, and encrypts it using the public key of the relay node.
  4. The client takes that encrypted message, attaches the identity of the relay node, and encrypts that whole thing again using the public key of the first (or "guard") node.
  5. The client sends the layered encrypted message to the first node in the chain. It has the corresponding private key for the first layer of encryption, so it can decrypt that first layer, but it doesn't have the other nodes' private keys so it can't decrypt any more data. The node now knows the identity of the next node it should send the message to, but it cannot learn the identities of further noes in the chain. All this node knows is the identity of the client and the identity of the next node. It passes the message on.
  6. The next node in the chain - the relay node - decrypts the next layer of encryption using its private key. This tells it the identity of the next node to pass the message to. It only knows the identity of the previous node and the next node. It doesn't know anything about the client or the server, or the message. It passes the message on.
  7. The final node receives the message. It decrypts the final layer of encryption using its private key. This contains the IP of the target server, and the message to be sent to it. The exit node only knows the message and the server identity, plus the identity of the relay node. It has no information about the first node or the client. The exit node passes the message to the server and relays the response back up the chain.

Shown visually:

Diagram of a Tor chain

(note: this diagram depicts a connection to a clearnet server via an exit node - hidden services work differently)

This is why Tor's onion routing can provide strong anonymity - there are sufficient layers to ensure that no node in the chain can discover the identity of both the entry and exit nodes. The node that knows the identity of the client knows absolutely nothing about where the message will ultimately be routed to, and the node at the end that has access to the message knows nothing about the client. By the time the message reaches the last node, no information remains within it that can be tied to the message that was passed through the first node.

This also protects against pairs of colluding nodes, to some degree. If the first node in the chain and the last node in the chain share their private keys, they still cannot directly correlate the packets because of the intermediate layer of encryption associated with the relay node in the middle. If the first node and relay node collude, they can correlate the traffic from the client forward as far as the exit node, but no further. If the relay node and exit node collude, they can correlate the traffic from the server backward as far as the first node, but no further. Only compromising all three nodes in the chain allows you to reliably deanonymise a user. Or, alternatively, compromising the client - at which point the mechanics of the Tor network are irrelevant.

There are further protections in practice. Building a chain isn't as simple as randomly selecting a set of nodes. The first and last nodes in a chain are selected in a different way to the relay in the middle.

The first node in the chain is called the guard node, and it is specially selected from a pool of long-term-stable nodes that have been vetted by the network as not exhibiting suspicious behaviour (e.g. dropping connections selectively, modifying packets, etc.). You cannot simply spin up a relay node and immediately become the first node in someone's chain. The guard node used by a client also doesn't change every time it makes a new chain - the guard node stays static for a period of time. This is designed to help protect against collusion and correlation attacks like you describe.

The type of the final node in the chain also varies depending on the resource requested. If it's a clearnet server on the regular internet, the final node must be an exit node. Exit nodes are also vetted by the network and checked for traffic tampering - if they mess with the pages being returned, they get banned from the network. This is also why you should use HTTPS through Tor if you're talking to a clearnet server. For Tor hidden services the connection actually consists of a pair of chains mediated by a gateway node in the middle. This keeps the IP address of the Tor hidden service anonymised, so the client only knows a public identifier for the service rather than its IP address.

Another thing to keep in mind is that messages for a particular chain are not passing through these nodes in isolation. Many other messages from many other chains are being routed through guard nodes, relays, exit nodes, and gateways at any one point in time. Distinguishing between messages belonging to different chains with only partial information (e.g. any pair of nodes colluding, or active participation from one node and network captures from the network of another node) is non-trivial. This is even trickier from a purely external perspective, e.g. with passive network captures from internet infrastructure.

Proposed deanonymisation attacks against Tor generally rely upon side-channel information such as traffic shape correlation. A colluding guard node and final node (either final relay or the exit node) can theoretically use the timings and sizes of packets to probabilistically correlate messages seen on the final node to messages seen on the first (guard) node. But, in order to leverage this in practice, one would need to have control of a trusted guard node and trusted exit node, which is not an attack you can pull off in an evening due to the network's trust requirements. On top of that, you'd have to be in luck that a client used both your guard node and exit node for a connection - you have no control over that - and be in luck that the connection is to a server of particular interest.

Deanonymisation attacks against onion routing networks are the subject of no small academic interest. The body of research that has gone into this topic spans at least two decades, and the Tor network has evolved over that time to protect against a wide variety of proposed attacks, both theoretical and practical. What I've described here is at best a rough summary of the overall operation and security model of the network.

The efficacy of the onion routing system against these kinds of attacks is demonstrated by the fact that only a small percentage of Tor connections were able to be deanonymised by the NSA's mass surveillance program, which involved the tapping of key internet backbones alongside ownership of, or access to, a number of guard and exit nodes. This minimal success is despite a bloated budget, participation from other states' security services, and the inherent level of access that their collective geopolitical status afforded them.

In summary, the demonstration you proposed does not reflect any real-world attack or technique, and demonstrating any practical deanonymisation attack on the Tor network itself would be a significant project for most professional security researchers. As a school project it is infeasible to say the least.

Polynomial
  • 132,208
  • 43
  • 298
  • 379
  • Thank you for your answer. I understand that I won't disprove tor's protocol with this idea. My main point was that if a large amount of entry nodes are hosted by someone and this someone also has a server, he can by random resolve a connection of a person if he connects through the hosted entry node and exits via the server end node. This wouldn't be an attack but more of a listening post that is hosted on the Tor network. – mantissa Nov 22 '21 at 09:25
  • I understand that's it's only realistically appliable by such organisations like the NSA, but only want to demonstrate this process artificially by giving all the needed information about the two nodes beforehand, meaning that I don't need to attack or somehow compromise the trusted nodes. The client itself will provide all the information needed about its entry node and the server will do the same with it's node. This is unrealistic, of course, but I only need to show the algorithm principle that will use your described thoretical approach of timings to compromise the system. – mantissa Nov 22 '21 at 09:47
  • @mantissa You'd need to be able to correlate between the entry and exit server data. Controlling the entry server (guard node) wouldn't get you that information. That node doesn't know the chain's node list, and the packets seen by the last node do not contain any information that will help tie them to the packets seen by the first node. – Polynomial Nov 22 '21 at 12:07
  • Is it somehow possible to get exit server data in a way that I could extract the needed information about the client, even if that means exposing some information about the client beforehand? – mantissa Nov 22 '21 at 14:02
  • You would have to have a malicious program running on the client computer, at the same time that the user is browsing via Tor, that queries the Tor control protocol interface to access the chain information. You cannot access the chain information from anywhere else, even if you leak prior information about the client. And if you've got malware on the client system, you don't need to touch Tor at all - you can just gain remote access to screencap what the user is doing. – Polynomial Nov 22 '21 at 14:41