13
2
The Joomla .ini
files require to be saved as UTF-8.
After editing I'm not sure if the files are UTF-8 or not.
Is there a Linux command like file
or a few commands that would tell if a file is indeed UTF-8 or not?
13
2
The Joomla .ini
files require to be saved as UTF-8.
After editing I'm not sure if the files are UTF-8 or not.
Is there a Linux command like file
or a few commands that would tell if a file is indeed UTF-8 or not?
29
You can determine the file encoding with the following command:
file -bi filename
This answer should be accepted. The explanation for the -bi options is in the man file.
– Jérôme – 2016-01-13T14:04:13.280is it supposed to work on macos as well ? I get regular file
on a file I though was utf8 – nicolas – 2016-04-24T15:49:37.763
3@nicolas For MacOS you could try file -I filename
(-I is a capital i). – Rik – 2016-04-24T16:07:11.743
@Rik I can confirm – nicolas – 2016-04-24T16:08:46.753
2Does this read the whole file? – ctrl-alt-delor – 2018-03-30T15:17:20.647
@ctrl-alt-delor What do you mean read the whole file? It shouldn't have to as the file encoding is probably placed in the header of the file. – kojow7 – 2018-04-20T15:33:09.357
@kojow7 utf-8 has no header. Pure ASCII (7-bit only), is indistinguishable from utf-8 (that is the point of it, a header will cause all sorts of problems). So if you have a file that is ASCII for the first MB then has a single UTF-8 character, then you will not know, unless you read the whole file. – ctrl-alt-delor – 2018-04-21T16:41:14.590
@kojow7 because if you only read a few bytes (3 are enough for the UTF-8 BOM) then the rest of the file can be, say, a PNG and thus not a valid UTF-8 file. – Alexis Wilke – 2018-12-28T10:11:23.450
6
There is, use the isutf8
command from the moreutils package.
Source: How can you tell if a file is UTF-8 encoded or not?
@davidpostill I'm curious, is bad practice to cite the author in the reference? – Pablo Olmos de Aguilera C. – 2016-08-28T20:26:56.110
No. However, it is good practice to make the link say where it leads me. Assume I'm reading only the blue text. After the edit, I can tell why and when I should click that. Before, I could not. (It wasn't me who made the edit but I'm like 94% sure that this is what it was about.) – Hermann Döppes – 2018-12-31T00:00:26.880
Nice, and works nicely with find -type f -exec isutf8 {} +
, because it also quotes the filename. (And with using find ... -exec ... +
is also fast) – Tomasz Gandor – 2019-03-22T13:28:19.303
0
Yet another way is to use recode
, which will exit with an error if it tries to decode UTF-8 and encounters invalid characters.
if recode utf8/..UCS < "$FILE" >/dev/null 2>&1; then
echo "Valid utf8 : $FILE"
else
echo "NOT valid utf8: $FILE"
fi
2You cannot tell the encoding of a file. You can only make a smart guess. You might mostly guess right, but sometimes guesses fail.
file
is an example of a program doing smart guesses. – Marco – 2013-09-24T21:17:15.2101@Marco: It is possible to verify whether it is valid UTF-8 or not, however. There are some encodings which can mistakenly pass as valid UTF-8, but it almost never happens with ISO-8859- or Windows-125 encodings/charsets. – user1686 – 2013-09-24T21:40:24.090