Same file, different file size

12

2

I made a backup from my FTP server, with lftp and Transmit (Mac app). Everything is fine, but there is a different in file size for 1-2 files, but they are identical.

First file:

http://dl.dropbox.com/u/229956/deadcow_seo.php

Second file:

http://dl.dropbox.com/u/229956/deadcow_seo.php_2.php

What is the difference between these two files?

user66638

Posted 2012-01-12T18:27:56.527

Reputation: 274

Are you sure the problem wasn't just the reported size? Apple switched the way Mac OS X reports storage space a couple of years ago to match the method that hard drive manufactures use. Instead of 1MB = 1024KB, they use 1MB = 1000KB. The other size may be reported by your Linux host using the other methodology so the files appear to be different sizes. Not sure if this applies in your case, but it's interesting none the less. – WebDevKev – 2012-01-18T20:36:33.730

It's not that. Just look at the files he provided as part of the question, or the diff screenshot in my answer. The files aren't nearly big enough for that to make a difference, by the way, at 1800-1900 bytes each. – Daniel Beck – 2012-01-19T14:21:53.003

Answers

25

deadcow_seo.php uses Unix line endings (LF), while deadcow_seo.php_2.php uses DOS/Windows line endings (CR LF).

FTP has several "transfer modes", out of which two are in common use1binary (also called "image") and text (or "ASCII"). In "binary" mode the file is transferred exactly as it is, byte-by-byte, while "ASCII" causes the file to be interpreted as consisting of lines of text – the line endings are converted to the network standard CR LF when sending, and converted to the machine's native line endings when receiving.

Transferring files as text might make some sense at first, but it only causes trouble later – in fact, some FTP servers have removed it completely or make it equivalent to binary on the server side. Besides, most text editors (excluding Notepad) can read and save files in both Windows and Unix formats.

Just configure your FTP client to always use binary mode – the command is usually bin or mode i, while graphical clients might have a checkbox or a file type list in their settings.


1 Some old modes are "tenex" (long obsolete, for TENEX page-based files) and "compressed" (which appears to be defined as a simple RLE algorithm). Recent FTP servers support "mode z" for zlib compression.

user1686

Posted 2012-01-12T18:27:56.527

Reputation: 283 655

12

You used text (or ASCII) transfer mode, which replaces line breaks during the transfer. This is often useful for when you develop scripts and programs on Windows and transfer the files to Linux or Mac OS X. They simply won't work otherwise, since the system sees garbage data at the end of every line.

If the file has a single Windows line break, \r\n (or CRLF), and you downloaded to Linux or Mac OS X, it was replaced by \n (or LF), which is 1 byte less. Using FileMerge to compare the files confirms this in the status bar:

enter image description here

Also see this answer on data interpretation.


You can configure which file types are interpreted as text in Transmit's preferences:

enter image description here

You can remove all file extensions from this list, and just standardize on Linux/Mac OS X line breaks, i.e. \n, even when using Windows. Most editors are capable to change the line ending mode.

Daniel Beck

Posted 2012-01-12T18:27:56.527

Reputation: 98 421