3

The issue

I'm using PHP's apache_note() to log variables from web requests to a CustomLog format. However, try as I might, Apache doesn't want to log UTF-8 characters the way I'd like.

In PHP, I have apache_note('some_value', '✔'); which corresponds to the VHost config which looks like this:

LogFormat "%{some_value}n" custom_format CustomLog ${APACHE_LOG_DIR}/access.log custom_format

However, Apache ends up logging the literal version like this:

\xe2\x9c\x94

What I've tried

  • Checked the values of both LANG and LC_ALL and they are both set to en_US.UTF-8
  • Updated /etc/apache2/envvars to use /etc/default/locale by default
  • Using mod_charset_lite I have set CharsetSourceEnc UTF8 and CharsetDefault UTF8 in the Apache config for the site (I know this is for content in/out)
  • Checked that /etc/apache2/conf.d/charset has set AddDefaultCharset UTF-8
  • Tried sending the logging output through a piped log to another program - its \xe2\x9c\x94 by the time it gets there, so it certainly seems like its something to do with the Apache process itself.
  • Read through the Apache logs docs

Ultimately, I want that access log to show something like:

but I'm pulling my hair out trying to get there.

Other information

  • Apache version 2.4.10
  • Debian 8.4

Update

Per Esa's suggestion, I modified the LogFormat directive:

LogFormat "%{some_value}n ✔" custom_format

And I get the following:

\xe2\x9c\x94 ✔

Which is interesting, because it suggests Apache's willingness to log UTF-8. However, I'm still not convinced that the issue has anything to do with PHP passing non UTF-8 values.

  apache_note('some_value', '✔');
  $value = apache_note('some_value');
  print_r($value);

in PHP still prints out

I will try re-compiling Apache next to see fi it helps, but I do need this in production which may be dicey.

Bill Huertas
  • 131
  • 1
  • 4
  • Have you tried to add the character directly to the `LogFormat` directive? This way you can narrow the problem down to either PHP `apache_note()` or Apache logging related. Apache uses the current locale and everything in your configuration looks consistent. Maybe the escaping happens within `apache_note()` function or before it gets to variable `%{some_value}`. – Esa Jokinen Jun 01 '17 at 15:51
  • @EsaJokinen : Great suggestion. Just tried modifying the `LogFormat` directive to `LogFormat "%{some_value}n ✔" custom_format` and what do you know, it correctly logs a ✔. I'll update the information in the main question. – Bill Huertas Jun 01 '17 at 19:25

2 Answers2

1

Escaped logging is a feature

Starting from 2.0.49, the Apache logging API escapes everything that goes to error_log, therefore if you're annoyed by this feature during the development phase (as your error messages will be all messed up) you can disable the escaping during the Apache build time:

% CFLAGS="-DAP_UNSAFE_ERROR_LOG_UNESCAPED" ./configure ...

Do not use that CFLAGS in production unless you know what you are doing.

thrig
  • 1,626
  • 9
  • 9
  • This is an excellent finding, but not an exhaustive answer: OP is talking about access log instead of error log. – Esa Jokinen Jun 01 '17 at 19:33
  • @EsaJokinen yeah well for those they'll have to monkey patch out a whole bunch of `ap_escape_*` calls mostly in `mod_log_config.c` and possibly other places, in which case the "Do not use ... in production" line doubly applies. – thrig Jun 01 '17 at 21:07
1

You'll find that it gets escaped in ap_escape_logitem. Have a look at the following code. It uses a macro called TEST_CHAR to determine what needs escaped, but output is basically ASCII

https://github.com/apache/httpd/blob/5ed78e19a21609f7097f9049b2fe6db8e471f810/server/util.c

Cameron Kerr
  • 3,919
  • 18
  • 24