Change line endings and encoding of file at the same time

0

I have some files with Windows line endings and latin-1 encoding that I need to convert to unix line endings and utf-8.

Of course I can

for file in ./*.csv; do
    sed s'/^M//' "$file" > "${file}.bak"
    iconv -f iso-8859-1 -t utf8 "${file}.bak" > "$file"
    rm "${file}.bak"
done

But is there a commonly available tool that can do both of these things at once? Maybe this isn't the most efficient way. (Maybe iconv?)

George Simms

Posted 2015-07-11T09:57:49.710

Reputation: 101

1I strongly believe the answer is "no", since those two tasks are quite different and it doesn't make sense to write a single tool to perform both tasks at once, particularly where unix philosophy is concerned. But hey, who knows when somebody is crazy enough... – Abel Cheung – 2015-07-11T21:18:48.263

Answers

0

I would make some slight modifications to your script. First don't use ls in your for loop, use *.csv because the glob will handle non-printable characters and spaces in file names. Instead of using sed's inplace redirect to $file.bak. If strings is available on your system then replace sed with strings. And always remember to quote variables.

    for file in *.csv
    do    sed 's/^M//' "$file" > "${file}.bak"
         #strings "$file" > "${file}.bak"
          iconv -f -iso-8859-1 -t utf8 "${file}.bak" > "$file"
          rm "${file}.bak"
    done 

fd0

Posted 2015-07-11T09:57:49.710

Reputation: 1 222

Thanks, I'll update the question accordingly, but that doesn't really answer the question. Also that's a somewhat less explicit way of converting line endings, and a reader could be confused about the motivation for the line - will it work if the file contains Welsh accented letters like w circumflex? – George Simms – 2015-07-11T15:38:11.573

Well, one tool may not be as efficient as two or more tools. My response was aimed at making your code more efficient. The glob should be much faster that using ls. strings blindly change carriage returns and carriage retrun/newlines to newlines. strings should be faster and if I was aware that your csv file's last line was terminated then I would have suggests using tr instead of strings .iconv` is the only tool I know for your purpose. – fd0 – 2015-07-11T16:12:29.583