0

I've several problems maintaining large production servers, in which some developers drop files from Windows environments, sometime with BOM-bytes (We use UTF8, and no need for that), causing lots of troubles.

Other times, I got a "no end of line" and "[DOS]" labels when vim-editing files directly on the server.

I recently discovered how to find for the bom byte, and how to delete it in a batch script. What about illegal bytes, bad EOLs? Is it safe to use DOS Text Files on a linux environment? Any drawbacks If I use to convert them with dos2unix cmd ?

Regards

Syquus
  • 46
  • 3

2 Answers2

0

Yeh, BOM-bytes are bad. The locale should determine the encoding of a file.

The other thing as you've rightly pointed out is line endings. Dos tends to be CRLF and Linux is LF only.

dos2unix will take care of this problem for you.

Philip Reynolds
  • 9,751
  • 1
  • 32
  • 33
  • "The locale should determine the encoding of a file" doesn't make sense _at all_ considering that files can be transferred over the Internet. (Not that anyone should be using something other than Unicode, of course.) – user1686 Sep 10 '10 at 14:22
0

"Bad EOL" (no end of line message) isn't bad. It just notifies you that there is no EOL after the last line. The Unix convention is to use EOL as a line terminator, and most Windows tools consider it a separator.

Other than the message (and slight annoyance when cating such a file), there is nothing bad in it.


DOS/Windows line endings (CR/LF) can cause some problems, especially in scripts: when Linux is reading the #! line, it will use everything up to the first LF, and will consider the CR part of interpreter filename.

For executable scripts it is best to use Unix line endings (:set ff=unix), otherwise Linux would attempt to execute /usr/bin/perl<CR> when you had #!/usr/bin/perl along with Windows line endings.

For other files, it doesn't matter much.


The UTF-8 signature (EF BB BF) can cause even more problems - disable with :set nobomb, mass-remove with sed -i 's/^\xef\xbb\xbf//'.


EOL: End-of-line character or characters; either LF or CR/LF, whichever is apropriate.

user1686
  • 8,717
  • 25
  • 38