1

My website has pages and other content with UTF-8 encoding. For HTML, setting the encoding in a meta tag is no problem. However, I also have raw text files with UTF-8 encoding that aren't displayed correctly, such as appearing as ×. I've considered adding a byte-order mark at the start of such files, but I'd prefer not to since they aren't always well supported. I followed the instructions in this other question, but it had no effect. This is the HTTP response header:

HTTP/1.1 200 OK
Date: Sat, 12 Aug 2017 15:41:04 GMT
Server: Apache/2.4.10 (Debian)
Last-Modified: Wed, 09 Aug 2017 19:24:33 GMT
ETag: "c04c-5565707a34966"
Accept-Ranges: bytes
Content-Length: 49228
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive

I was hoping to see Content-Type: text/plain; charset=utf-8. How can I get reliable UTF-8 encoding for these URIs?

Brent
  • 130
  • 1
  • 1
  • 8
  • The right answer here is `AddType`, since Apache won't send a Content-Type if it doesn't recognise the type, and `AddDefaultCharset`/`AddCharset`/etc are then irrelevant. You should unaccept the accepted answer, which is wrong, and add your own answer. – EML Jan 28 '20 at 10:21
  • The 'AddDefaultCharset' / 'AddCharset' would still be necessary, right? If so, I'm more inclined to suggest an edit to the currently accepted answer. – Brent Jan 29 '20 at 17:34
  • I suspect that `AddDefaultCharset utf-8` is already in the Apache config for most or all recent releases, but you do need to check that it's there, and add it if not. I said `AddType` is the right answer because Apache has no idea what to do without it, and it will probably just work with it (ie. if there's already an `AddDefaultCharset`). Up to you what to do, but anybody coming here for an answer has to read all the comments and make the connections from the missing content-type. – EML Jan 29 '20 at 17:51
  • Okay. I've created a new answer and marked it as the solution. – Brent Jan 29 '20 at 19:41

2 Answers2

8

Content-Type is not sent for 304 Modified responses because there is no content body for such a response.

Look at the 200 response type and you should see this. Use Ctrl + F5 to force a refresh and a 200 response rather than revalidating the cached response with a 304 response.

You then updated your question to include a 200 response, but I would expect that always to have a Content-Type: text/plain header or equivalent (even if the character set is not included) but that is not in your example, so not sure you have all the details in that?

Regardless, the correct way to set this is to add the following to your apache config:

#Set the correct Char set so don't need to set it per page.
AddDefaultCharset utf-8
#for css, js, etc.
AddCharset utf-8 .htm .html .js .css

The first (AddDefaultCharset) will set the charset for text/plain and text/html resources.

The second (AddCharset) requires mod_mime and will set the charset for other types based on file extension. Javascript files are sent with content type of application/javascript and CSS files are sent with content type of text/css so are not picked up by the AddDefaultCharset setting. The .htm and .html files don't really need to be in this as will be picked up by default but no harm being explicit.

Barry Pollard
  • 4,461
  • 14
  • 26
  • I've updated the question with the 200 response – Brent Aug 12 '17 at 15:42
  • 1
    Not sure you did that right, but updated the answer anyway. – Barry Pollard Aug 13 '17 at 11:20
  • Thank you for bearing with me. I've checked `sudo apache2ctl -M` and `mime_module (shared)` appears and I've added the AddCharset directive. I'm still not seeing the response header for the .yml linked in the OP. Here's my `/etc/apache2/apache2.conf` https://pastebin.com/hXJhmTSA . I believe that is the right file based on running `apache2ctl -V` as described here: https://serverfault.com/questions/12968/how-to-find-out-which-httpd-conf-apache-is-using-at-runtime . Do I need to also indicate somehow that .yml is text/plain? – Brent Aug 13 '17 at 16:02
  • 1
    I also needed to add `AddType text/plain .yml`, and it works now. – Brent Aug 13 '17 at 16:08
  • You could have also added it to the other line: AddCharset utf-8 .htm .html .js .css .yml – Barry Pollard Aug 13 '17 at 16:12
  • Yeah, I added the .yml to the AddCharset but that wasn't enough. Adding the AddType finally did it. – Brent Aug 13 '17 at 16:12
  • Fair enough. Not used .yml files before. Glad you got it sorted anyway. – Barry Pollard Aug 13 '17 at 16:14
2

I fixed this problem with by adding these lines to 'apache2.conf':

AddType text/plain .yml
AddDefaultCharset utf-8

This was some time ago as of writing this answer. Recent Apache installations may have utf-8 already set as the default.

Brent
  • 130
  • 1
  • 1
  • 8
  • Thanks to Barry Pollard and EML for their contributions. – Brent Jan 29 '20 at 19:41
  • Also in httpd.conf (CentOS) the default `AddDefaultCharset UTF-8` is in upper case —this makes no difference. Also `curl -I YOUR.URL.HERE` will return the browser default latin-1 if there is no unicode in the page received, so the test needs to be with an unicode page in the first place. – Matteo Ferla Nov 24 '21 at 10:47