How does Bittorrent work?

34

18

I want to learn more about the bittorrent way of file sharing. I am a technically advanced user (programmer), so technically advanced material is no problem, but it should be concise and to the point. I need a good resource book/web which explains the overall bittorrent architecture.

I am not interested in details, just the overall architecture and the terminology like seeds, peers, etc.

Any suggestions?

explorest

Posted 2011-02-13T20:39:02.070

Reputation: 803

Answers

29

Overview of how bittorrent works:

  • You have peers and a tracker. All peers together at any given moment are the swarm. The usual situation is one or a few peers has the complete fileset and wishes to make it available to other peers.

  • A peer acquires a .torrent file, which will have among other things A) the SHA-1 hash of the fileset, B) the URL of the tracker, and C) the number of pieces that the file is broken into, as well as an SHA-1 hash of every piece. The size of the pieces are determined by the torrent itself.

  • The peer then connects to the tracker using the URL specified in the torrent. The tracker responds with a list of peers. Trackers talk HTTP over port 80 or 443.

  • The peer then selects another peer, using the information from the tracker, and contacts it directly to set up an exchange session, attempting to get a piece. Note that exchange sessions are directly done by the peers and the tracker is NOT involved in the transfer. The tracker only provides information.

  • Once the peer has a piece, it verifies it against the SHA-1 hash, and writes it to the file. It can then offer that piece when selecting another peer. Subsequent exchange sessions involve "trading" pieces. I believe peers will generally only give you the first piece if you have no other pieces.

  • The peer reconsults the tracker every so often to get an updated list of peers. The peer does not have to wait for one exchange to finish before starting another one if it has multiple pieces, so once the peer has a bunch of pieces the transfer can really speed up. This is why torrents start slow but gain speed quickly as the peer acquires pieces.

  • When a peer has all the pieces, the entire file is verified against the fileset SHA-1 hash. Then, it becomes a seeder, and is now doing nothing but helping the fileset be more highly available. Peers that do not have all the pieces are leechers.

  • If a torrent has no seeds, it is dead, although if a complete copy of the file exists between all pieces held by all peers they will eventually trade to get a complete copy amongst themselves.

  • The SHA-1 hash is how the tracker and peers "know" which file is supposed to be swarmed. Filenames in the torrent aren't used to identify the data. Pieces that don't verify against the hases in the .torrent file are thrown out. Peers that continually send bad pieces are snubbed by other peers and will eventually not be able to connect to anyone in the swarm.

  • A smaller piece size means the torrent is more robust since peers can trade pieces quicker, but it also means more hashes of pieces in the .torrent file have to be listed and therefore the .torrent file can be large.

  • If you are publishing something via BitTorrent, it's best to seed the file as long as you wish to make it available. Other peers will be helping you, since most BitTorrent software implements algorithms that favor trying to spread things among as many peers as possible to maximize conncurrent connections. In this way BitTorrent can help you publish things and save bandwidth costs.

LawrenceC

Posted 2011-02-13T20:39:02.070

Reputation: 63 487

1A beautiful answer! One quip: I believe seeds prefer to seed rarest, not the first pieces first. Not sure if this is an algorithm thing, but that's what I got from a torrent program once while messing with settings. – Gallifreyan – 2018-09-07T20:00:30.350

18

Nice paper on the subject here

      http://davidhales.name/posters/patarin-hales-delis-poster6.pdf (Note is actually a .pdf file and can be viewed with Acrobat Reader.)

Here's an image file someone made of it's contents:

patarin-hales-delis-poster6

Moab

Posted 2011-02-13T20:39:02.070

Reputation: 54 203

8

There's a rather nice video on YouTube explaining this in a visual way with cardboard cut-outs. It's not a highly technical explanation, but is great for explaining the idea behind BitTorrent to people in a simple, understandable way.

how bittorrent works on youtube

nhinkle

Posted 2011-02-13T20:39:02.070

Reputation: 35 057

3+1 Useful video. Showed it to my dad. He had no questions afterwards. Astonishing. :) – zero2cx – 2012-09-20T19:17:35.670

2

An overview on the peer messages protocol.

The client can use two protocols to share information with peers, TCP or uTP (over UDP). This data follows the bittorrent protocol specification, section peer messages.

So programmatically, a connection has to start between two clients. After the connection is set (through TCP or uTP), a bittorrent handshake is initiated by the client which grabbed remote peer's information (ip&port) from tracker or through DHT. This handshake contains the info_hash that identify the torrent this connection will be about.

Let's see first how Torrent data is truncated through the protocol. A piece is a part of the data you are sharing through the network. Not to get confused with a block, which is a section of a piece wrapped into a packet. The block is the granularity to share a piece through packets, and a piece is the granularity to share a Torrent through peers.

When the connection starts, both clients (local client, which I'll call LC and remote client, RC ) are choked and uninterested. Choked means "I won't answer any of your messages, too busy, but I might take them into account". Unchocked therefore means "I will answer your messages". Interested means of course that I'd like some pieces you have. Therefore, the state of a connection between two peers could be defined with those four states : LC_chocked?, LC_interested?, RC_chocked?, RC_interested? To warn RC that I'm (un)chocked or (un)interested, I have to send him (un)interested and (un)chocked messages, and reciprocally.

To inform each other which pieces they have, they can send a bitfield message just after handshaking. As it's name suggests, it is a bit string where each bit is set to 1 if the client has this peculiar piece, 0 otherwise.

So if LC is chocked and interested and RC has unchocked him, then he can send a request messages to ask for a block belonging to one piece he knows LC has thanks to bitfield message.

When a peer has received the entire piece, he could send a piece message to inform all it's remote peers so that they update the associated bitfield they are holding.

That is a very basic overview, and of course not all details are provided here, like the choking algorithm etc... If you want more details, check the two links I posted above, in the comment section (as a new user I can't have more than two links within a post).

Jules Randolph

Posted 2011-02-13T20:39:02.070

Reputation: 347