47

Does Tor have any protection against an adversary simply running a very large number of nodes?

Someone with the necessary resources could just run thousands of relay nodes (including exit nodes). If they were an organization like the NSA, they could also make the major hosting companies running nodes turn over the private keys, or install backdoors, without the "owner" of the node noticing.

I know tor employs entry guards as a protection - a client chooses a set of entry guards at random, and only ever connects to those as entry nodes. If the entry guards are uncompromised, the user is safe. This gives the user at least the chance of not being profiled; without entry guards the user would eventually be caught.

However, what if the adversary is not interested in busting all users that access a certain site, or targeting a specific user. if they just want to identify some random portion of users that access that site, couldn't they do this by running a few thousand nodes and waiting?

I can imagine they could even target specific users, and force them to use only compromised nodes. Compromise one guard node of the user (wiretap his line, observe what server he connects to and send them a court order or some thugs, or just be lucky and control the right nodes by chance). Then run thousands of modified clients. Once the targeted user goes online, flood the network momentarily. In cooperation with your compromised nodes, keep the compromised paths free, so that the client will eventually build a circuit only on your nodes. Voila, you can eavesdrop on the user.

Are there any protections against this in Tor? Can you give an estimation on how many nodes the attacker would have to run? Are there any non-technical countermeasures, e.g. would someone intervene if 3000 new suspicious nodes would pop up on AWS?

(Note this is different from other questions on this site. For example my previous question asks about the case where the attacker can completely control your line; he fakes the whole network. Tor guards against this by using a list of known good nodes, and using signatures.)

jdm
  • 941
  • 9
  • 11

7 Answers7

31

Tor provides privacy only under the assumption that at least one node in the randomly selected chain is not attacker-controlled (since we are talking about traffic analysis, simply eavesdropping on traffic entering and exiting this node, without trying to decrypt it, counts as "control"). This is probabilistic. If the attacker controls, say, 50% of all nodes, and your browser uses a chain of length 5, then the attacker wins with probability 0.55 = 1/32.

To mitigate such attacks, you can configure your client to choose chains non-uniformly, but instead to enforce a "global spread" so that the chain will go through nodes in several countries who don't like each other.

Tom Leek
  • 168,808
  • 28
  • 337
  • 475
  • 9
    Unless GeoIP can be somehow faked. – Deer Hunter Aug 30 '13 at 15:55
  • Which it easily can in your part of the internet backbone (that you, as a government control). – Lucien Dec 13 '15 at 08:53
  • 4
    Actually, I just re-read this answer and **each and every sentence is factually incorrect**. 1) If the middle node is not attacker-controlled but the exit and guard are, you are screwed. 2) It is not entirely probabilistic. See the "weight factor" and "node families". 3) There is no such thing as a chain length of 5, and that math is incorrect anyway. 6) A "global spread" actually makes things **much worse**. Read the ASToria paper on AS-aware networks before assuming that more diversity equals more anonymity. Also, I think you mean anonymity, not privacy (they're completely different things). – forest Jun 17 '18 at 02:21
13

It doesn't, running a large number of nodes is one of the main weaknesses of Tor. Pinning allows you to select particular nodes to use, but it's important to choose your nodes well and to try to avoid misbehaving ones as a path routed entirely through colluding nodes is not secure and each colluding node reduces the effective security some.

There are efforts to try to judge which nodes are well behaving based on how long they've been around, but a patient and skilled attacker could filter in lots of nodes over time while being pretty hard to detect.

Even if the entire transmission chain isn't compromised, some degree of understanding can be had by looking at the timing of packets entering and exiting a node. Delays on transmissions would help, but would increase latency and aren't currently supported by Tor.

forest
  • 64,616
  • 20
  • 206
  • 257
AJ Henderson
  • 41,816
  • 5
  • 63
  • 110
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/69986/discussion-between-aj-henderson-and-forest). – AJ Henderson Dec 10 '17 at 05:23
7

Are there any non-technical countermeasures, e.g. would someone intervene if 3000 new suspicious nodes would pop up on AWS?

Yes. There are systems and people monitoring the network, and the directory authority operators will block floods of blatantly suspicious new relays. This happens regularly, with questionable academic research projects, especially generous new relay operators, and more nefarious actors.

In fact, your hypothetical scenario happened almost literally in December 2014. Some people went and fired up 3300 relays on Google cloud servers for some reason or another. Nothing happened. The relays were blocked; they received almost no traffic; and it's hard to say what the operators were even trying to accomplish.

The RELAY_EARLY attack is probably the worst example. For the first half of 2014, an attacker, apparently American academic/government researchers at Carnegie Mellon University SEI CERT, combined a hundred fast relays and a security bug to deanonymize untold users, leading to multiple arrests in the US and potential risk from any other party recording traffic at the time.

Matt Nordhoff
  • 3,430
  • 1
  • 21
  • 16
3

In addition to running thousands of nodes, an attacker would have to run those thousands of nodes for a long time because part of the decision process about what nodes to use considers the consensus weight, which includes uptime as a factor. Additionally, there are flags that can be added to nodes by the highest ranked (by consensus) relays that mark a relay as a bad relay, which negatively impacts how much use it would get.

Most of Tor's protections against this kind of attack would be related to the restrictions on how old a relay must be to get a sizable portion of traffic.

forest
  • 64,616
  • 20
  • 206
  • 257
IceyEC
  • 131
  • 2
2

All of these answers are factually incorrect, even the most upvoted one. Tor has several specific features that mitigate this, called a sybil attack. While the middle and exit relay are able to change every 10 minutes, the first relay, the guard, stays with you for a very long time. This ensures that, even if an attacker creates a large number of relays, they will not be able to easily get you to use their relay.

Imagine that the attacker gets a chance to control your guard and exit relay, and that chance is determined based on the percentage of relays they control. Would you rather they get a new shot at identifying you with a small likelihood of success once every 10 minutes, or once every year? The latter is how Tor operates, despite all the misinformation above.

https://blog.torproject.org/improving-tors-anonymity-changing-guard-parameters

guest
  • 21
  • 1
2

Im a new user, and therefore unable to add this as a comment - but its in response to user1535427

(as commented by IceyEC for user1535427 post) Tor Uses 3 x hops in a circuit between you and the end server.

In some circumstances, it will use more, such as when connecting to a "hidden service" or ".onion Site", as this uses 3 x hops between you and whats referred to as a "rendezvous point" (which is just another normal relay node within the Tor Network) plus 3 x further hops from the "rendezvous point" to the end destination server.
But other than this, Tor is hard-coded to use 3 x hops.

initially, this may seem counter-intuitive - as i thought this myself. however, there are several reasons for keeping it to 3 x hops.
Some of these reasons (but not limited to) are:

1 - increasing the number of hops adds latency, which negatively impacts speeds and usability, and user-experience.

2 - (kind of following on from reason 1) If increasing the hops for all users slows things down overall, but could "potentially" add extra layers of security, then, couldn't one answer be that Tor allows users to choose how many hops they use to create their circuit at the expense of higher latency, if the particular individual deemed the trade-off acceptable?
well, actually no. This would mean different users would have different "path lengths" and thus would server as "an extra, potentially identifying piece of information" for example, prehaps you are the only person who specifically chose to go through say 123 hops? if an attacker could determine your number of hops, they might be able to uniquely identify, or at least distinguish you from other traffic, by this factor alone.
In Short, increasing the number of hops for every circuit is bad, and so is giving users the choice to vary the number of hops themselves.

3 - which brings me to another reason, apparently (and also counter-intuitive), tests have shown that using 4 or more hops in practice, actually offers no more security than you'd get with just the 3 x hops.
in hindsight, having only 3 x hops offers, technically, the highest security whilst also delivering the faster connections (which is kind of neat if you think about it!)

the important thing to take away from this is that:
- As long as an attacker DOES NOT control BOTH your "entry" AND "exit" nodes,
- In addition to your end server being either an ".onion server", or an "open-web server with HTTPS"

then you are relatively safe. if the attacker knows both your entry and exit nodes, its game over. it defeats the entire point of the Tor network.

TurnerOC
  • 21
  • 2
  • More hops in general isn't useful because they do not defeat traffic confirmation attacks. I think middle nodes are only used so an exit node can't perform a guard discovery attack. – forest Jun 17 '18 at 02:17
0

The only way this could help an attacker is if they ran every relay node, and furthermore that they were sure of this fact. The strength of Tor is that a given node does not know if the node that it relays data back to is another relay node or the end user. I could not find information online whether Tor uses a specified number of jumps per connection, because if, for example, an attacker knew for a fact that Tor connects 5 relay nodes between a user and an endpoint, and the attacker saw 5 of their relay nodes in a chain, they would know the IP of the endpoint and user.

For this reason, I would guess that Tor does not use a constant number of jumps.

Having said all this, if the attacker ran an exit node and you used it to connect to a plain HTTP clearnet site they would be able to see all the data you sent across, although even then you can't see the IP of the user.