6

How is data downloaded via HTTP or FTP checked for corruption?

I know that TCP provides a 16-bit checksum field in its header, which is used for checking. Also torrents use a more powerful checksum method, MD5 or (not sure) CRC32.

At first I thought HTTP implemented CRC, because as far as I know it is low cost and 'great' for networking (detecting accidental changes), but I couldn't find anything on this topic.

So how do FTP and HTTP ensure that data is not corrupt?

I am aware that data corruption can happen during saving of the file.

jwodder
  • 166
  • 1
  • 6
Maverick
  • 163
  • 1
  • 5
  • @AviD is right, this is actually off-topic, as it's unrelated to a security problem (just an integrity problem). You'd have to re-formulate your question to include a *security issue*. – Marcus Müller Dec 21 '16 at 18:09

3 Answers3

15

How is data downloaded from HTTP or FTP get checked for corruption?

By itself, not at all.

HTTP and FTP as protocols don't offer integrity¹.

However, HTTP and FTP are usually used atop of TCP/IP, which both have checksums in their transports – if a TCP checksum fails, your operating system will just discard the TCP packet and ask for it again. So there's no need for HTTP to even implement integrity checking.

When tunneling anything (including HTTP and FTP) over TLS, you get an additional layer of integrity checks.

So how does FTP and HTTP ensure that data is not corrupt?

They don't. It's usually the transport's job to guarantee integrity, not the job of the application protocol.


¹ there is an optional header in HTTP1.1 that allows the server to specify a checksum, but since that is practically impossible for resources generated on the fly and comes at high cost for large files, and has little advantage over the much more fine-granular TCP-checksuming, it's rarely used. I don't even know whether browsers commonly support it.

I'd like to add here that it's of course harder to cause a collision within MD5 (which is used in these headers) than to forge TCP packes, if you'd want to intentionally modify the transfer. But if that is your attacking scenario, TLS is the answer, not HTTP checksums.

Marcus Müller
  • 5,843
  • 2
  • 16
  • 27
  • Thank you for the detailed explanation. I was thinking of how files get corrupted over network but what I get from your answer is they don't, they actually happen on our system. Also what do you think about using FTP (on one system base) in text mode for downloading and IMG file (on another system base)? – Maverick Dec 21 '16 at 00:51
  • The problem would happen because of different carriage type CR+LF on windows and LF on UNIX, so the IMG file would get corrupted. – Maverick Dec 21 '16 at 00:52
  • 9
    Seriously? You're using a mode that is called **text** mode to transfer non-text files and try to fix that by having checksums? That mode is really a 1980's relic. Nobody uses it anymore, even for text files, as everyone dealing with text files will usually simply understand either line break format. This is 2016, not 1987. – Marcus Müller Dec 21 '16 at 00:54
  • I know it old :D But I wanted to find an example where I would "break" a file (corrupt it)... Because in the back of my had I had an idea after a file was downloaded it is checksummed with the file from the server. For example that last packet send would be the checksum of the actual file :D – Maverick Dec 21 '16 at 00:57
  • 3
    @Maverick Well if FTP/HTTP had checksums, they wouldn't notice that kind of coruption, because the protocol is *deliberately* changing ("corrupting") the file because you asked it to. If the protocol had a checksum then the checksum would be on the already-"corrupted" file. – user253751 Dec 21 '16 at 01:17
  • 2
    FTP is an awful protocol and needs to die. Use SFTP (which is not simply FTP over TLS - that one is FTPS - but a totally different protocol on top of SSH). – André Borie Dec 21 '16 at 13:16
  • 2
    @AndréBorie it's actually not ever been designed to be a protocol – it's really more of a telnet session that can spawn separate channels to transmit data. The folder listing format is not inherently fully specified. Which can lead to very interesting problems with some clients, e.g. if your username or file name start with a space character, or contain a line break. – Marcus Müller Dec 21 '16 at 13:18
6

HTTP and FTP almost always use TCP as the underlying transport layer, so TCP's protections apply there as well. These, however, are only concerned with network-level accidental corruption issues. As you point out, verification that the file was written successfully is generally left to a checksum like CRC32.

If you are concerned with intentional manipulation (which I assume you are, since this is on security.SE), these aren't sufficient, because they're not cryptographically secure. On the network side, we're generally dealing with a different layer by introducing TLS (when combined with HTTP, we get HTTPS; when combined with FTP, we get FTPS). But if you want to be particularly certain, as well as verify the integrity of the file you have on disk, a common approach for vendors is to provide a file listing the SHA-2 checksums, and then sign that with a GPG key; you download both files, then verify the checksums file has been signed by a key you trust, and that the other file(s) match the checksums listed in the file you've just verified.

So how does FTP and HTTP ensure that data is not corrupt?

In short, from a security perspective, they don't.

Xiong Chiamiov
  • 9,384
  • 2
  • 34
  • 76
  • Thank you very much for replying, I know that with any serious vendor you are provided with checksum file (all linux distros ...) which you can use to check manual. Thank you again – Maverick Dec 21 '16 at 00:43
0

All those protocols assume to be stacked. And each level in the stack has its own responsability. What is weird relative to your question is that integrity of the message is neither the concern of FTP or HTTP nor the one of TCP. The former are responsible the the high level part of the protocol allowing to exchange files or data, the latter is responsible for the correct delivery of packets. The integrity of the packets is the concern of the level 2 layer (data link) in the OSI stack. It should guarantee that a packet exchanged between 2 consecutive nodes hase not been altered. The most common protocol here is HDLC which uses a strong checksum.

Of course, there is a checksum at the TCP level, but it only a catch all feature to be alerted in case one of the node went mad and send erroneous data, and it does use a much weaker checksum. After all, integrity is not its job.

If you use normal networks, data integrity is guaranteed by the full TCP/IP stack. The end to end checksums are normally used to control that the file was not corrupted in the sender site either because you use a mirror or in case of a problem on the originating file system.

Serge Ballesta
  • 25,636
  • 4
  • 42
  • 84