13

Sony recently saw a large amount of data stolen. In order to prevent the stolen data from being spread using torrents, they are doing what was called a bad seed attack. What is this attack? Is this a known attack or some term they coined for what is happening in this specific case?

Here's an excerpt from an article relating to the attack:

(...) that starting yesterday, “all of a sudden we saw files matching the SHA1 signatures of the Sony torrents starting to be populated across all the torrent sites.” (...) files were intelligently designed to have the same signature as the GoP file torrents (...) [The SHA1 signature is in the metadata provided with the seed, not a result of a file that causes a SHA1 "collision" by matching the file's exact hash.]

From what I understand, they are not using SHA1 collisions; the original file was replaced and they simply re-calculate the hash which is provided with the file. (Or hashes, are these per block or per file?) The client checks a block after downloading it, realizes it is fake, and attempts to re-download it possible from a different seed. This goes on until all correct blocks are downloaded. Is this correct? So the clients can still get the files, it will just take a lot more time since fake data is constantly being downloaded?

Daniel
  • 627
  • 5
  • 16

1 Answers1

13

Correct. As explained in that article the torrents use the BitTorrent protocol to share Sony's stolen data. Each piece that is downloaded via a seed is linked with an index into the file, and the hash of that portion is checked and verified. However, I don't believe its this hash that they are referring to in that article. Below I'll describe the process of downloading a file via BitTorrent.

.torrent File
The BitTorrent protocol itself says that the .torrent file contains two fields of information:

  1. announce: URL of the Tracker
  2. info: Maps to a dictionary of keys

Each piece of data in the dictionary is used to piece together the original file. This means that a file is split into predetermined pieces, and each piece is generally a power of two. This dictionary contains all the information about each piece, the length, and the SHA1 hash of each piece to verify a complete download.

Tracker
When you attempt to download a file with BitTorrent it will attempt to do a Tracker GET Request. In response, there is a lot of information about what peers are currently available for the download. Typically a compressed version of the list of peers that can be used to download the file. Connections are made to the peers in order to download each piece, and once all pieces have been received the Torrent client is notified. Each piece is considered to be complete when the SHA1 hash of the data is verified. As stated before, this hash value is provided in the info dictionary.

This SHA1 hash is not the hash that is being spoofed. One other piece of information that is included in the Tracker response is a field called info_hash:

info_hash
The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file. The info-hash must be the hash of the encoded form as found in the .torrent file, regardless of it being invalid. This value will almost certainly have to be escaped.

Keep this in the back of your mind, we'll come back to it.

Peers
The Peer protocol is what's being spoofed. Each peer must establishes which piece of the file is downloaded by that peer using an index. There is a handshake for establishing this connection, and it involves sending the value of info_hash. The client downloading the file verifies that this matches the SHA1 hash of the bencoded info field of the .torrent file.

What Sony allegedly did is put out fake files with the correct hash values to establish connections to clients attempting to download them. The files can be bogus because the client must download the entire piece in order to verify that portion. When the check fails then a download for that piece must be restarted.

As you can see (and it's mentioned in that article) this can be a problem for web servers. The constant downloading of portions of the file with literally no end can end up causing major bandwidth usage. This can results in a DDoS of the web server.

RoraΖ
  • 12,317
  • 4
  • 51
  • 83
  • I was curious whether a different peer is chosen for the re-download when the hash fails. The protocol doesn't seem to specify what behavior to follow, but it seems some peer-to-peer clients allow you to ban the peer after a *bad data* threshold is reached. – Daniel Dec 15 '14 at 21:38
  • If I had to guess I would think that behavior is a client configuration. – RoraΖ Dec 16 '14 at 12:27