Can a file get accidentally corrupted while downloading?

Question

I read an answer here about websites providing downloadable files together with their hashsums. It contained a sentence I'm thinking about whenever I download something, but I never really understood it: The provided hash lets you double-check that the file you downloaded was not corrupted accidentally in transit.

I think I can vaguely remember that this, having to redownload a file because it's broken, happened sometimes in the past, when I suffered from a 56k modem, and downloads where a pain in general. But I'm not sure this happened, and I couldn't explain it - there is TCP, which should be capable of handling my download perfectly fine, and it's around since at least 1983.

Is there any way a downloaded file could differ from the file on the server, besides malicious attacks like MITM? Or: as a user, if I think something isn't right about the finished download, does it have to be a MITM attack?

TCP handles layer 4, layers 5-7 can have plenty of problems that corrupt the downloaded file — KDEx, Oct 30 '15 at 20:24

score 1 · Answer 1 · answered Oct 30 '15 at 20:39

It is possible but uncommon with HTTP that the actual bytes are corrupt. But it is not uncommon that the download simple breaks in-between and that the browser does not notice. This is especially true if no length information are sent with the content (i.e. ends with TCP close) but even if a length is sent or chunked encoding is used browsers often ignore such errors.

With FTP it is even worth because there is no real way to indicate the length of the content, that is the content always ends with connection close. Apart from that corruption can happen if you use the wrong transfer mode ASCII instead of BINARY.

score 0 · Answer 2 · answered Oct 30 '15 at 20:32

Packet transmission errors can and do happen in transit. TCP corrects most errors before they get to the file, but even TCP isn't flawless. The TCP checksum is only 16 bits, and it's certainly possible for a randomly introduced error to coincidentally pass the checksum.

This paper says that anywhere from 1 in 16 million to 1 in 10 billion packets will fail to detect a transmission error. Though it's rare, it's still useful to have some means of additional checks.

score 0 · Answer 3 · answered Oct 30 '15 at 20:57

Application layer intermediaries like HTTP caches can manipulate the data in a unencrypted HTTP session without any general way of detecting it. Any intermediary router can do the same at the TCP layer.

TCP errors can also occur, but as the other answers have identified this is very uncommon.

Downloads over low/no-integrity transports like HTTP and FTP can't be trusted. Additional verification with GPG/PGP or similar signing mitigates this assuming the keys you are verifying were downloaded or otherwise obtained securely.

Downloading over TLS (assuming valid certs and sensible CAcerts locally) effectively solves the first problem. It provides integrity and confidentiality. It also substantially mitigates the second problem as silent TCP errors would fail at the TLS layer.

Early close of connection can in theory be detected at the TLS layer because of the missing SSL shutdown. But in practice lots of software does not do a proper SSL shutdown so these errors get ignored. Which means early close of connection will usually not be detected even if TLS is in use. — Steffen Ullrich, Oct 30 '15 at 21:30
Good point! From experience in programming I've seen that closing/resource-cleanup is frequently mishandled and errors ignored. — Alain O'Dea, Oct 30 '15 at 21:47

Can a file get accidentally corrupted while downloading?

3 Answers3