4
4
I'm unifying the encoding of a large bunch of text files, gathered over time on different computers. I'm mainly going from ISO-8859-1 to UTF-8. This nicely converts one file:
recode ISO-8859-1..UTF-8 file.txt
I of course want to do automated batch processing for all the files, and simply running the above for each file has the problem that files whose already encoded in UTF-8, will have their encoding broken. (For instance, the character 'ä' originally in ISO-8859-1 will appear like this, viewed as UTF-8, if the above recode is done twice: � -> ä -> ä
)
My question is, what kind of script would run recode only if needed, i.e. only for files that weren't already in the target encoding (UTF-8 in my case)?
From looking at recode man page, I couldn't figure out how to do something like this. So I guess this boils down to how to easily check the encoding of a file, or at least if it's UTF-8 or not. This answer implies you could recognise valid UTF-8 files with recode, but how? Any other tool would be fine too, as long as I could use the result in a conditional in a bash script...
Note: I've looked at questions like http://superuser.com/questions/27060/batch-convert-files-for-encoding-or-line-ending-under-windows and they do not provide an answer for this particular question.
– Jonik – 2010-03-06T16:04:07.027