9

As https://panopticlick.eff.org/ demonstrates, web browsers are very prone to fingerprinting, both active and passive (often you can fingerprint a web browser by simply monitoring the wire).

My question is: Is the same possible with BitTorrent clients?

My research so far suggests that DHT "address" may be persistent and/or reveal the list of info hashes the torrent client has, thus making passive fingerprinting trivial, because the lists of info hashes are almost unique. Is this true?

I also recall reading that one can remotely obtain the list of all info hashes currently in the bittorrent client, even if some of them are paused, in most clients. This suggests an active fingerprinting possibility. I could not find the source though.

The client version string also provides some information, although too little to serve as the only source of fingerprint info.

Is flow-based behavioral fingerprinting a threat in case of BitTorrent clients?

Are there any other sources of information that can be used for fingerprinting I'm not aware of?

Shnatsel
  • 2,802
  • 2
  • 16
  • 15

1 Answers1

12

Once it starts, a BitTorrent client generates a 20-byte identifier called peer_id, it consists of ClientIdentifierClientVersion-RandomNumbers. Granted, ClientIdentifier and ClientVersion don't provide any identification, but the RandomNumbers could be used to identify a client, and there are some things that needs to be understood:

  • The peer_id could be anything the clients wants so send. The protocol doesn't define how it should be generated, nor it defines how it should be structured.

  • The client gives its peer_id to whomever asks for it (it's part of the handshake).

  • Setting the no_peer_id flag allows for the communication to happen without needing to disclose the peer_id.

  • The peer_id is generated every time the clients is restarted. uTorrent 3.3 even generate a new peer_id while it's running from time to time.

However, BitTorrent's uses a DHT called Kademlia, which is a great news for security researches (AKA stalkers). A while ago, I've analyzed uTorrent and Transmission, they both use encryption for DHT/PEX (if enabled and forced).

When a client enters the DHT "network" it bootstraps by connecting to some bootstrap server. This usually happens the first time you run a new client. But that's not the only thing that happens when bootstrapping Kademlia.

When a client joins the network, it is given a randomly generated identifier called node ID. This node ID is randomly chosen from the same 160-bit space as the BitTorrent 'info hashes'. At any time, you can clear your DHT information and rejoin the network, which would give you a new identifier. Since not many users do that, you can relatively count on it as an identifier.

To get a client's node ID it's enough to send them a DHT ping message, and it will respond with its ID

{"t":"T_ID", "y":"q", "q":"ping", "a":{"id":"THE_SENDER_ID"}}

  • t is transaction ID, it identifies the message and is sent back with the response
  • y is the type of the message, and the value "q" means it's a query.
  • q is the is the type of the query, and here it's a ping query.
  • a represents the arguments of the query. Here it simply contains the sender's node ID.

A client would respond

Response = {"t":"T_ID", "y":"r", "r": {"id":"MY_ID"}}

the value "r" for y means that this is a response.

Those are mainly the two ways to identify a client provided by the protocol itself.

Nemo
  • 149
  • 8
Adi
  • 43,808
  • 16
  • 135
  • 167
  • Great answer, thanks! I wonder if the DHT ping message is usually encrypted? If it is, then it would be only available to an active attacker. – Shnatsel Jun 09 '13 at 12:35
  • 1
    @Shnatsel This is client-specific. A while ago, I've analyzed uTorrent and Transmission, they both use encryption for DHT/PEX (if enabled and forced). Many other clients have encryption enabled, yet they allow legacy connections (non-encrypted). – Adi Jun 09 '13 at 12:52
  • The encryption is quite weak too, if I remember correctly. Like a random 64 bit number combined with the infohash or something like that. – forest Jan 10 '18 at 07:39