Once it starts, a BitTorrent client generates a 20-byte identifier called peer_id
, it consists of ClientIdentifierClientVersion-RandomNumbers
. Granted, ClientIdentifier
and ClientVersion
don't provide any identification, but the RandomNumbers
could be used to identify a client, and there are some things that needs to be understood:
The peer_id
could be anything the clients wants so send. The protocol doesn't define how it should be generated, nor it defines how it should be structured.
The client gives its peer_id
to whomever asks for it (it's part of the handshake).
Setting the no_peer_id
flag allows for the communication to happen without needing to disclose the peer_id
.
The peer_id
is generated every time the clients is restarted. uTorrent 3.3 even generate a new peer_id
while it's running from time to time.
However, BitTorrent's uses a DHT called Kademlia, which is a great news for security researches (AKA stalkers). A while ago, I've analyzed uTorrent and Transmission, they both use encryption for DHT/PEX (if enabled and forced).
When a client enters the DHT "network" it bootstraps by connecting to some bootstrap server. This usually happens the first time you run a new client. But that's not the only thing that happens when bootstrapping Kademlia.
When a client joins the network, it is given a randomly generated identifier called node ID
. This node ID is randomly chosen from the same 160-bit space as the BitTorrent 'info hashes'. At any time, you can clear your DHT information and rejoin the network, which would give you a new identifier. Since not many users do that, you can relatively count on it as an identifier.
To get a client's node ID
it's enough to send them a DHT ping
message, and it will respond with its ID
{"t":"T_ID", "y":"q", "q":"ping", "a":{"id":"THE_SENDER_ID
"}}
t
is transaction ID, it identifies the message and is sent back with the response
y
is the type of the message, and the value "q"
means it's a query.
q
is the is the type of the query, and here it's a ping query.
a
represents the arguments of the query. Here it simply contains the sender's node ID.
A client would respond
Response = {"t":"T_ID", "y":"r", "r": {"id":"MY_ID"}}
the value "r"
for y
means that this is a response.
Those are mainly the two ways to identify a client provided by the protocol itself.