Apache web server REALLY doesn't seem to like utf-8

1

This is really annoying me so I hope somebody here can tell me what I'm doing wrong. I've been running an Apache web server on my Windows 7 laptop for a while, now, to test some programming I've been doing in my spare time. Recently, I noticed that Unicode characters in my pages were not displaying correctly in my browser. I did what turned out to be a lot of pointless testing and discovered that the characters were actually being sent as correct UTF-8, and I also added a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> line to the top of my html output, which was meant to fix the problem but didn't actually do anythng. Then, I took the time to check the HTTP response headers, and saw that Apache is sending back this:

HTTP/1.1 200 OK
Date: Sun, 19 Jul 2015 18:18:40 GMT
Server: Apache/2.2.25 (Win32)
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1

OK, so that seems like a pretty simple problem. According to the Internet, I can fix that by adding AddDefaultCharset utf-8 to my httpd.conf file. There weren't any AddDefaultCharset lines in the file anywhere, so I added it at the bottom exactly as it's spelled above. Then I restarted the Apache service, and found that the problem was unchanged. So, I restarted my computer, on the theory that I might not know how to correctly restart Apache, but it still doesn't display the characters correctly, and it's still sending charset=iso-8859-1 in the headers.

I also found a website that suggested that something called Windows VirtualStore might be automatically saving the config file somewhere else just to f*ck with me, but I don't believe that this is happening because I can't find any Apache files in the VirtualStore directory. Also, you're supposed to be able to disable VirtualStore by taking ownership of the folder, and I did that a while ago because it wasn't letting me change anything otherwise.

Unfortunately, I'm not even sure what else to look at for this issue. Anyone have any ideas?

Gorcq

Posted 2015-07-19T18:36:24.567

Reputation: 13

1

According to http://www.w3.org Setting charset information in .htaccess the content type should be set in .htaccess

– DavidPostill – 2015-07-19T18:53:53.813

I have AddDefaultCharset UTF-8 above <IfModule mime_magic_module> in my httpd.conf file, though I don't think the position should matter. Are you also including a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> line in the head section of your web pages? ISO-8859-1 is the default browser choice. – moonpoint – 2015-07-19T19:07:46.303

@moonpoint Yes, I do have an http-equiv line. Probably should have mentioned it in my original post. That was one of several changes I made to make sure the problem wasn't in my programming. – Gorcq – 2015-07-19T19:10:27.250

1@DavidPostill: Putting the content type in .htaccess is a workaround for shared servers where you don't have access to the server config; if you're running your own server, the server config files themselves are a much cleaner idea. – Guntram Blohm supports Monica – 2015-07-19T19:43:02.630

You said "test some programming", which language do you use? Does this happen with plain html files as well, or just with generated files? For example, if you're using perl and the CGI module, print header() makes perl send a iso-8859-header to apache, which overrides any configuration; you need to use print header (-charset => 'UTF-8');. – Guntram Blohm supports Monica – 2015-07-19T19:46:00.037

@moonpoint: The http-equiv line is for cases when the html file is read from disk, so there is no http header; it shouldn't have any effect on the headers from the web server itself, and if a web server serves a page with a Content-type: character set that disagrees with the meta tag, the Content-type has precedence. – Guntram Blohm supports Monica – 2015-07-19T19:48:46.337

@GuntramBlohm It is perl, but I wasn't using any sort or print header line. I have added one with the correct charset specified, but it didn't affect the problem at all. – Gorcq – 2015-07-19T19:54:24.187

@DavidPostill I have now tried adding the configuration to a .htaccess file anyway, but it had no effect. – Gorcq – 2015-07-19T20:04:59.247

Answers

1

Try creating a 3-line HTML file, request it from the browser, and check the header. Doing this ensures there are no CGI headers or anything that interferes with your server configuration.

<html><head><title>Some Test file</title></head>
<body>unicode test äöüÄÖÜß</body>
</html>

Keep playing with your server configuration until this file is sent with a charset=utf-8 specification in the Content-type header.

Next, try a minimal perl program:

#!/usr/bin/perl   <-- or omit this line as you're on windows
binmode(STDOUT, ":utf8");
print qq(Content-type: text/html; charset=utf-8

<html><head><title>Some Test file</title></head>
<body>unicode test äöüÄÖÜß</body>
</html>
);

If this works (sends the correct header), then anything else that doesn't work is in your libraries, not your server configuration.

Guntram Blohm supports Monica

Posted 2015-07-19T18:36:24.567

Reputation: 472

Crap. Well, this was stupid. As soon as I saw your test perl program, I remembered that mine also sends a Content-type line as my first print statement. It specifies iso-8859-1. It's been sitting there at the beginning of my script for a couple of years now, and I'd forgotten about it. The <meta tag I added was in addition to it and contradicted it. I've changed it, and now everything works. Thanks for your help. – Gorcq – 2015-07-19T20:42:33.777