charset=iso-8859-2 webpage displays with UTF-8 header - question marks (�) instead of accented letters

1

I have a webserver administration question. In this website: http://www.mirkaphoto.hu/ All PHP generated pages contain the following line:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" />

But this is somehow disregarded probably via the php-apache processing and the page displays in browsers with an UTF-8 header. As a result of that question marks (�) are shown in the page text instead of accented characters (éáöőóüűúí). I tested this in Firefox, IE, Chrome and Seamonkey.

The strangest in this phenomenon, that this symptom started only yesterday, after I upgraded my server to Debian 8.0 Jessie from 7.0 Wheezy. During the upgrade I also upgraded all other packages as well, including apache, php, and so on, and selected "yes" for overwriting config files with factory default ones. After this I fine-tuned my config files to have everything the way I like, but I did not find a way to fix this. Before the upgrade, the page displayed just fine.

Here is a screenshot, where you can see that Firefox sees the "charset=iso-8859-2" definition, but still displays the page with UTF-8 encoding.

screen shot

My suspicion is, that this is a server configuration issue, but it could also be, that one part of the processing component (Apache, php) changed due to the upgrade in some way, resulting this strange behavior. The problem is, I can't pinpoint, what could possibly cause this problem.

Can anyone solve this mistery? What could be possibly going wrong during the processing of the page?

giny8i8

Posted 2015-12-03T07:45:07.673

Reputation: 81

Note that the HTTP Content-Type header provided by the web server takes precedence over the <meta http-equiv="Content-Type" specification in the HTML/XHTML code. See https://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

– pabouk – 2019-12-31T09:20:08.513

Answers

3

The server’s HTTP headers say

Content-Type: text/html; charset=UTF-8

which browsers would probably consider more trustworthy than what’s inside the file. Why not just use UTF-8? It’s an established encoding on all platforms.

Also, there’s garbage text before the HTML declaration:

[M _2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Daniel B

Posted 2015-12-03T07:45:07.673

Reputation: 40 502

Kinda make sense to use UTF-8 instead, the problem is, I just host this site, I am not the developer of it. The webpages were created long time ago, and are not really maintained by the developer. The client says "It worked before, make it work again" and to be honest she is correct with this expectation. – giny8i8 – 2015-12-03T08:38:23.530

A quick fix would probably be to use Apache’s mod_header to change the Content-Type header in .htaccess or similar. Also, unmaintained applications/CMS/whatever are insanely dangerous. They must absolutely be kept up-to-date. If that can’t be done, static pages must be used. – Daniel B – 2015-12-03T08:52:18.713

I enabled the mod_headers modul in apache. I also added the following to the virtualhost configuration: Header set charset iso-8859-2. Than I reloaded the server with service apache2 reload. But the page still displays with UTF-8 :( Maybe I did not use the right syntax? Not sure... I am not very familiar with using mods for apache2.

Do you have a specif suggestion, what should I put where? – giny8i8 – 2015-12-03T13:34:53.427

There is no charset header. There’s only the Content-Type header. So you’ll somehow need to restrict this to relevant files only. There are various methods to match files or URIs. You wouldn’t want to change image files’ headers. They aren’t HTML, after all. It may well be that this isn’t the solution. You’d be better of researching how to change the PHP application. – Daniel B – 2015-12-03T14:12:22.833

Thanks for your support @Daniel B, I think I managed to find the right way to fix this. I created an answer post for the solution, feel free to vote it up, if you think it is ok. It did the trick for me. – giny8i8 – 2015-12-03T16:32:45.450

2

After a lot of searching I managed to find the right solution. My thanks fly out @Daniel B, for pointing me to the right direction. :)

It seems, that due to the upgrade, the apache2 engine processes all Content-Type "text/html files with UTF-8 charset, disregarding the <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" /> statement in the actual html / php files. I am not sure why this supposed to be a good thing (please explain if you can). Nevertheless the solution for getting rid of the question mark characters (�) was the following:

The Solution: I added the below line to the VirtualHost apache2 definition of my website in /etc/apache2/sites-available/MySiteName.conf than I reloaded the server configs with the service apache2 reload command. After this the files are served with proper Content-Type: text/html; charset=iso-8859-2 character encoding header.

<VirtualHost * >

# [...Some other configurations before this line]

    #To fix encoding problem, that pages display with UTF-8 header though they are created with iso-8859-2 encoding - giny8i8 2015-12-03
    Header set Content-Type "text/html; charset=iso-8859-2"
        # Source:  http://superuser.com/questions/1008480/charset-iso-8859-2-webpage-displays-with-utf-8-header-question-marks-inste/1008482?noredirect=1#comment1397150_1008482

</VirtualHost>

Let me know if this works for you too, if you encounter the same challange after a Debian 8.0 Jessie upgrade! I searched for this on the internet, but did not find it spelled out like this. Hence my answer post.

giny8i8

Posted 2015-12-03T07:45:07.673

Reputation: 81

1

here is solution:

/etc/httpd/conf/httpd.conf:

# Specify a default charset for all content served; this enables
# interpretation of all content as UTF-8 by default.  To use the
# default browser choice (ISO-8859-1), or to allow the META tags
# in HTML content to override this choice, comment out this
# directive:
#
#AddDefaultCharset UTF-8

marcin

Posted 2015-12-03T07:45:07.673

Reputation: 11

1Welcome to Super User! Generally, answers are much more helpful if they include an explanation of what the code is intended to do, and why that solves the problem without introducing others. – MMM – 2019-12-31T10:55:55.497

0

The solution described by giny8i8 works. However, if for some reason you want error messages to appear in that character set you should use:

Header always set Content-Type "text/html; charset=iso-8859-15"

Antonio J. de Oliveira

Posted 2015-12-03T07:45:07.673

Reputation: 1