Batch convert to UTF-8 a directory having both UTF-8 and CP-1251 files

1

I have a directory containing files, some of them are UTF-8, some are CP-1251. I want to convert the ones that are CP-1251 to be UTF-8, but without corrupting the UTF-8 files.

I tried using iconv -f cp1251 -t utf8 <...>, it works for CP-1251, but if the file is already UTF-8, it is also converted and becomes incomprehensible.

sashoalm

Posted 2014-01-04T11:45:28.200

Reputation: 2 680

Answers

1

I found a way to do it using enconv:

enconv -L bulgarian -x utf8 file.txt

It works for both UTF-8 and CP-1251 files.

sashoalm

Posted 2014-01-04T11:45:28.200

Reputation: 2 680

1

You could get a list of files that are neither UTF-8 nor US-ASCII using:

file -0 -i *.txt | awk -F '\0' '$2 !~ /charset=(us-ascii|utf-8)$/ {print $1}'

user1686

Posted 2014-01-04T11:45:28.200

Reputation: 283 655

Just a minor correction - I tried it, but it showed the files that are UTF-8, instead of those that "are neither UTF-8 nor US-ASCII". – sashoalm – 2014-01-04T11:59:16.223